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Abstract 


The  simulation  of  crowds  of  virtual  characters  is  needed  for  applications  such  as 
films,  games,  and  virtual  reality  environments.  These  simulations  are  difficult  due  to 
the  large  number  of  characters  to  be  simulated  and  the  requirement  for  synthesizing 
realistic  human-like  motion  efficiently.  This  thesis  focuses  on  two  problems:  how  to 
search  through  and  select  motion  clips  of  behaviors  so  that  human-like  motion  can 
be  generated  for  multiple  characters  interactively,  and  how  to  model  and  synthesize 
variation  in  motion  data. 

Given  a  collection  of  blendable  segmented  motion  clips  derived  from  motion 
capture  or  keyframed  animation,  this  thesis  explores  novel  ways  to  apply  heuris¬ 
tic  search  algorithms  to  generate  goal-driven  navigation  motion  for  virtual  human¬ 
like  characters.  Motion  clips  are  organized  and  interconnected  through  a  behav¬ 
ior  graph  that  encodes  the  possible  actions  of  a  character.  A  planning  approach 
is  used  to  search  over  these  possible  actions  to  efficiently  generate  motion.  This 
technique  works  well  for  synthesizing  animations  of  multiple  characters  navigating 
autonomously  in  large  dynamic  environments. 

In  addition,  this  thesis  introduces  a  novel  planning  approach  based  on  precom¬ 
putation  that  is  more  efficient  than  traditional  forward  search  methods.  We  present  a 
technique  for  precomputing  large  and  diverse  trees,  and  describe  a  backward  search 
method  used  during  runtime  to  solve  planning  queries.  This  new  approach  allows  us 
to  develop  an  interactive  animation  system  that  supports  a  large  number  of  characters 
simultaneously. 

Finally,  this  thesis  addresses  the  issue  of  motion  variation.  Current  state-of-the- 
art  crowd  simulations  often  use  a  few  specific  motion  clips  or  repeated  cycles  of  a 
particular  motion  to  continuously  animate  multiple  characters.  The  idea  of  synthe¬ 
sizing  the  subtle  variations  in  motion  data  has  been  largely  unexplored,  as  previous 
work  considers  variation  to  be  an  additive  noise  component.  This  thesis  instead  uses 
a  data-driven  approach  and  applies  learning  techniques  to  this  problem.  Given  a 
small  number  of  input  motions,  we  model  the  data  with  a  Dynamic  Bayesian  Net¬ 
work,  and  synthesize  new  spatial  and  temporal  variants  that  are  statistically  similar 
to  the  inputs. 
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1.1  Films  and  games  are  applieations  that  motivate  our  work.  Left:  A  seene  from 


the  animated  film  Madagascar.  Right:  A  sereenshot  from  The  Sims  eomputer 


game. 


1 .2  Left:  A  simple  behavior  graph  of  high-level  motions  used  in  our  Behavior  Plan¬ 


ning  approaeh.  Right:  Planned  behaviors  for  100  animated  eharaeters  navigating 


in  a  eomplex  environment .  4 


1 .3  Left:  An  example  of  a  preeomputed  seareh  tree.  This  is  a  frequeney  plot;  eaeh 
point  represents  the  number  of  paths  that  ean  reaeh  that  point  from  the  root  of  the 
tree.  The  root  is  near  the  middle  of  the  figure  and  the  tree  progresses  in  a  forward 
direetion  (or  up  in  the  figure).  This  tree  has  about  220,000  nodes  and  requires 
a  storage  memory  of  10  MB.  Right:  Sereenshot  of  our  interaetive  system.  The 
eharaeters  respond  to  user  ehanges  interaetively  while  navigating  in  large  and 


dynamie  environments 
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1.4  A  DBN  for  the  variables  Xi, ...,  X„.  Eaeh  node  X*  represents  one  DOF  in  the 
motion  data.  We  use  the  prior  network  to  model  the  first  2  frames  of  eaeh  input 
motion  elip.  The  transition  network  then  models  subsequent  frames  given  the 
previous  2  frames.  We  assume  a  2nd-order  Markov  property  beeause  it  is  the 


simplest  model  that  works  well 
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3.1 

The  range  of  planning  eonfiguration  spaees  for  generating  the  motions  of  a  human- 

like  eharaeter.  Our  Behavior  Planning  approaeh  differs  from  previous  methods 

in  that  it  lies  in  between  the  two  extremes  in  this  range. 
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3.2 

Left:  The  problem  inputs  inelude  a  deseription  of  the  environment,  a  starting 

position  (larger  green  dot)  and  orientation  (smaller  green  dot  points  toward  the 

direetion),  and  a  goal  position  (red  dot).  Right:  The  output  is  a  motion  sequenee. 
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3.3 

Left:  A  simple  graph  of  behaviors.  Eaeh  node  or  behavior  eontains  a  set  of 

example  motion  elips  for  that  behavior.  Eaeh  edge  indieates  allowable  transitions 

between  behaviors.  Right:  An  example  graph  used  for  a  human  eharaeter  that 

includes  speeial  jumping  and  erawling  behaviors.]  .  . 
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3.4 

A  dynamic  environment  with  a  falling  tree.  Left:  Before  it  falls,  the  characters 

are  free  to  jog  normally  in  the  open  spaee.  Center:  As  it  is  falling,  the  eharaeters 

ean  neither  jog  past  nor  jump  over  it.  Right:  After  it  has  fallen,  the  eharaeters 

can  jump  over  it. 
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4.1  Overview  of  the  system 


4.2  Frequeney  plot  of  the  preeomputed  tree.  Eaeh  point  represents  the  number  of 
paths  that  ean  reaeh  that  point  from  the  root  of  the  tree.  The  root  is  near  the 
middle  of  eaeh  figure  and  the  tree  progresses  in  a  forward  direetion  (or  up  in  the 
figure).  The  tree  eovers  an  area  that  is  approximately  a  half  eirele  of  radius  16 
meters,  with  the  eharaeter  starting  at  the  eenter  of  the  half  eirele.  The  majority 
of  paths  end  up  in  an  area  between  8  and  14  meters  away  from  the  start.  We  used 
about  1,500  frames  of  motion  at  30  Hz.  Left:  Exhaustive  tree  of  6  depth  levels 
built  from  graph  with  21  behavior  nodes.  This  tree  has  over  6  million  nodes  (over 


300  MB).  Right:  The  pruned  tree  has  220,000  nodes  (about  10  MB) 
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4.3  Left:  An  environment  gridmap  initialized  with  UNOCCUPIED  eells.  The  intu- 

ition  for  this  gridmap  is  that  if  eell  (x. 

,  ?/j)  is  oeeupied  by  an  obstaele,  the  tree 

nodes  eorresponding  to  this  eell  and  their  deseendant  nodes  (the  blaek  ones)  are 

BLOCKED.  Right:  Eor  eaeh  node  i,  we  preeompute  and  store  the  eorresponding 

VB.luCS  y midtime ji- 

.  51 

4.4_ Left:  In  the  goal  gridmap,  eaeh  eell  eontains  a  sorted  list  of  paths.  Eaeh  path’s 
total  eost  is  the  sum  of  the  eost  of  the  motion  states.  The  sorting  is  based  on 
_ this  total  eost.  Sinee  eaeh  node  in  the  tree  eorresponds  to  a  unique  path  if  we _ 

_ traee  the  node  baek  towards  the  root  of  the  tree,  we  ean  also  say  that  eaeh  goal _ 

eell  eontains  a  sorted  list  of  nodes.  We  will  use  this  gridmap  during  runtime; 
the  intuition  is  that  if  we  know  the  grideell  that  the  goal  position  is  in,  the  paths 
or  nodes  in  that  eell  eorrespond  to  the  potential  solutions.  Right:  A  straight- 
_ forward  diseretization  of  the  goal  gridmap  may  not  work  well.  An  “overlapped _ 

diseretization”  works  well. I .  52 


4.5  Left:  We  align  the  eoordinate  spaees  between  the  environment  and  the  tree. 


We  translate  and  rotate  the  obstaeles  and  the  goal  position  so  that  the  starting 
position  and  orientation  (of  the  eharaeter  in  the  environment)  mateh  with  that  of 
the  preeomputed  tree.  Right:  If  the  size  of  the  grideell  is  d,  we  ean  guarantee  that 
the  mapping  of  an  obstaele  to  the  environment  gridmap  is  eorreet  if  the  sampling 

ofpointsforthe'obstacIelTatmosrdTT^'ap^ . 


53 


4.6  The  2  eolumns  eorrespond  to  the  first  2  iterations  of  the  runtime  path  finding 
phase  for  this  example.  The  top  row  shows  the  start  (green  sphere)  in  eaeh  it¬ 
eration,  and  the  sub-goal  (red  sphere)  seleeted  from  the  eoarse-level  path.  The 
bottom  row  shows  the  path  returned  by  the  runtime  path  finding  algorithm  (light 
and  dark  blue)  and  the  partial  path  ehosen  (dark  blue  only).  An  estimate  of  the 
outline  of  the  preeomputed  tree  is  shown.  The  tree  is  transformed  to  the  global 
spaee  only  in  the  figure  to  show  how  it  relates  to  the  other  parts  of  the  environ¬ 
ment.  There  is  only  one  preeomputed  tree,  and  it  is  never  transformed  to  the 


global  spaee  in  the  algorithm, 
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X 


4.7 

The  process  of  tracing  back  the  list  of  sorted  nodes  P  towards  the  root  node  in 

Algorithm  3[  Left:  The  3  cases  under  which  each  trace  of  node  p  stops.  The 

sub-goal  is  inside  the  dashed  square  (a  cell  of  the  goal  gridmap).  Right:  Simple 

example.  The  blue  nodes  are  the  nodes  of  the  precomputed  tree.  The  sub-goal 

is  somewhere  in  the  square-shaped  box  of  red  nodes.  The  other  colored  nodes 

correspond  to  the  3  cases . 

56 

4.8 

The  points  of  the  path  that  are  eventually  chosen  by  the  coarse-level  planner  for 

this  environment.  | . 

59 

4.9 

Screenshot  of  the  interactive  system.  The  characters  interactively  respond  to 

user  changes  to  obstacles  and  their  respective  goal  locations  while  navigating  in 

a  large  environment . 

61 

4.10  Examples  of  precomputed  trees  used  in  our  comparison.  All  trees  have  the  same 

number  (826)  of  nodes.  Each  tree’s  root  is  at  (0,0),  and  the  paths  move  in  a 

forward  (or  up  in  the  figure)  direction  because  the  input  actions/motions  allow 

the  character  to  move  forward  and/or  slightly  turn  left/right.  Note  that  many 

paths  overlap  because  of  the  tree’s  structure . 

62 

5.1  A  DBN  for  the  variables  Xi, ...,  Each  node  Xi  represents  one  DOE  in  the 

motion  data.  We  use  the  prior  network  to  model  the  first  2  frames.  The  transition 

network  then  models  subsequent  frames  given  the  previous  2  frames.  We  assume 

a  second-order  Markov  property  because  it  is  the  simplest  model  that  works  well. 

73 

5.2  When  learning  the  strueture  for  the  transition  network,  we  do  a  cross  validation 

over  each  motion  sequence.  We  take  each  sequence  as  testing  data,  and  use  the _ 

others  as  training  data.  For  the  testing  sequence,  we  take  the  first  two  frames 
as  input  and  re-synthesize  the  whole  sequence  with  the  given  structure.  The 
newly  synthesized  sequence  is  then  compared  to  the  original  data  to  evaluate  the 
structure.  This  is  what  we  compute  intuitively  in  the  scoring  function  for  the 
transition  network  of  the  DBN.1 .  77 


5.3  We  “unroll”  the  DBN  from  Figure  5.1  to  synthesize  new  variants.  We  show  here 


I  the  unrolled  network  for  5  time  frames.  Note  that  the  first  two  frames  come 

from  the  prior  network  of  the  DBN  and  may  not  contain  cycles.  Since  the  DBN 
represents  a  joint  probability  distribution  over  the  possible  trajectories  of  each 
DOF,  we  sample  from  this  distribution  to  generate  new  variants.  It  is  important  to 
recognize  that  the  synthesized  motion  does  not  have  a  one-to-one  correspondence 
with  any  one  of  the  input  motions.  This  means  that  the  synthesized  motion  is 
not  just  a  copy  of  one  of  the  input  motions  plus  some  slight  differences,  but 
the  timing  of  the  whole  motion  itself  is  different.  Furthermore,  no  new  pose  is 
exactly  the  same  as  any  previous  pose.1 . 


81 


5.4  Results  for  cheering,  walk  cycle,  and  swimming  motion.  In  each  column,  the  top 
image  shows  the  4  inputs  (overlapped,  each  with  different  color)  and  the  bottom 
_ image  shows  the  15  outputs  (overlapped,  each  with  different  color).  These  are _ 

I  frames  from  the  animationsTI .  83 


XI 


5 .5  Given  the  learned  strueture  and  just  one  jumping  motion  as  inputs,  we  synthesize 
four  new  variant  motions.  We  overlap  poses  from  these  four  new  motions  at 
similar  time  phases  of  the  jump.  We  ean  see  the  variations  in  the  poses  at  these 
time  phases.  The  poses  for  the  head  vary  the  least  beeause  the  head  poses  also 


vary  the  least  in  the  input  data. 
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5.6  Plots  of  four  inputs  (in  blue)  and  fifteen  output  variants  (in  blaek  or  green)  for 
eheering  motion.  Eaeh  eurve  represents  one  motion  elip.  Note  that  these  motions 
are  not  eyelie.  Left  Column:  Two  seleeted  plots  of  DOF  vs.  time.  Middle 
Column:  Two  seleeted  plots  of  DOF  vs.  DOF.  Right  Column:  Two  seleeted 


plots  of  PCA-dimension  vs.  PCA-dimension 
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5.7  Left:  Example  frame  from  walk  eyele  motion  elip  with  strawman  noise  added  to 
left  shoulder/arm.  The  left  shoulder  turns  in  a  way  that  does  not  synehronizes 
with  the  right  arm.  Right:  Example  frame  from  walk  eyele  motion  elip  with 
Perlin  noise  added  to  right  hip/knee.  The  right  hip/knee  pause  and  move  in  a 
way  that  do  not  synehronize  with  the  rest  of  the  walk  eyele.  In  both  oases,  the 
unnaturalness  of  the  timing  of  the  whole  walk  eyele  ean  be  better  seen  in  the 
animations.  I .  87 


5.8  Two  examples  of  motion  sets  that  do  not  work  with  our  approaoh.  There  are  five 
overlapped  walk  eyoles  in  eaeh  ease.  Four  of  them  are  inputs  (in  blue)  that  are 
similar,  and  the  other  one  (in  magenta)  does  not  fit  together  with  these  four.  Left: 
For  the  one  that  does  not  fit,  the  arms  swing  higher  than  the  other  four.  Right: 


For  the  one  that  does  not  fit,  the  motion  turns  slightly  to  the  right 


88 


5.9  Left:  Plot  of  frequeney  versus  likelihoods  for  the  training  data.  Right:  We  started 

with  a  testing  set  of  eight  walk  eyeles,  and  our  method  seleeted  these  five  to  be _ 

similar  to  the  ones  in  the  training  set.  | .  89 


6.1  Left:  Example  frame  of  resulting  animation  generated  with  five  original  motion 
elips.  Right:  Example  frame  of  resulting  animation  generated  with  thirty-two 


variants  together  with  the  five  original  elips 
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Chapter  1 
Introduction 


There  is  a  great  need  to  ereate  erowd  animations  in  eomputer-simulated  environments.  Films  and 
games  are  two  main  applieations  that  often  have  seenes  with  a  large  number  of  eharaeters.  Figure 
o  shows  some  example  applieations  that  motivate  our  work.  Animated  films  sueh  as  The  Lord 
of  the  Rings  and  Madagascar  [[99ll  require  the  use  of  erowd  animation  systems  to  autonomously 
generate  the  motions  for  hundreds  of  eharaeters.  Games  sueh  as  The  Sims  require  autonomous 
eontrol  of  a  large  number  of  eharaeters  in  real-time. 

An  ideal  animation  system  should  be  able  to  simulate  a  large  number  of  virtual  eharaeters 
that  behave  like  a  large  group  of  humans  in  the  real  world.  We  would  like  eaeh  eharaeter  to  aet 
in  its  unique  way  and  move  about  with  its  own  purpose.  They  need  to  move  and  internet  with 
the  environment  and  other  eharaeters  in  a  seemingly  intelligent  way.  Eaeh  eharaeter  needs  to 
move  around  and  avoid  obstaeles  automatieally.  They  should  exhibit  eomplex  motions  in  large 
and  dynamie  environments.  The  synthesis  proeess  needs  to  be  effieient  enough  sueh  that  a  user 
ean  internet  with  these  eharaeters  in  real-time.  In  addition,  the  eharaeters  needs  to  exhibit  variety 
in  many  ways.  Eaeh  eharaeter  must  behave  in  its  unique  style  and  move  with  its  own  speed. 
Eaeh  motion  should  look  different  even  if  the  same  aetion  is  performed  repeatedly.  It  would  be 
realistie  to  have  variation  in  the  body  and  faeial  features  of  eaeh  eharaeter,  in  how  the  eharaeters 
are  dressed,  and  in  how  their  hair  are  styled.  We  want  all  these  aspeets  of  a  eharaeter  to  look 
natural.  Ideally,  we  would  have  effieient  algorithms  to  generate  these  features. 

It  would  be  great  if  we  have  erowd  animation  systems  that  have  all  of  the  above  properties. 
However,  this  has  not  been  fully  aehieved.  While  many  erowd  animation  systems  ean  generate 
motion  for  eharaeters  that  ean  walk  on  flat  ground,  it  is  often  diffieult  to  generate  more  eomplex 
motion  with  them.  Eor  example,  in  a  eomplex  environment  with  dynamie  obstaeles  and  multiple 
eharaeters  navigating  in  it,  it  is  diffieult  to  assure  that  the  eharaeters  avoid  the  obstaeles  and 
eaeh  other.  Current  erowds  and  game  systems  sometimes  aeeept  slight  eollisions  between  the 
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Figure  1.1:  Films  and  games  are  applications  that  motivate  our  work.  Left:  A  scene  from  the 
animated  film  Madagascar.  Right:  A  screenshot  from  The  Sims  computer  game. 

characters  as  a  tradeoff  for  runtime  efficency.  As  the  number  of  characters  increase,  it  becomes 
computationally  more  difficult  to  guarantee  that  the  characters  would  not  collide  with  each  other. 
In  addition,  crowd  systems  often  use  steering  methods  that  employ  a  number  of  local  rules  to 
animate  a  group  of  characters.  While  they  work  well  for  simple  two-dimensional  characters, 
directly  applying  these  methods  to  produce  motion  for  human-like  characters  can  lead  to  jittery 
artifacts.  Although  more  sophisticated  methods  might  be  used  to  handle  this  issue,  these  methods 
can  be  time-consuming  if  we  were  to  use  them  for  a  large  number  of  characters. 

Crowd  animations  can  also  look  unnatural  because  they  often  use  a  pre-specified  set  of  cyclic 
motions.  For  example,  a  walk  cycle  is  a  motion  clip  consisting  of  two  steps  of  walking  for  a 
human-like  character.  To  simulate  a  crowd  of  characters  [|66l.  a  few  of  these  walk  cycles  might 
be  used  repeatedly  for  all  the  characters  and  all  of  the  synthesized  walking  cycles.  Similary,  in 
crowd  animations  for  films,  an  animator  typically  has  a  library  of  cyclic  motions  from  which 
to  generate  the  motion  for  all  the  characters  [|99l.  These  cyclic  motions  lead  to  synthesized 
animations  that  look  monotonous  and  unrealistic.  In  applications  such  as  games,  the  same  motion 
clips  are  usually  replayed  when  they  are  needed.  When  a  character  needs  to  perform  a  football 
throw,  for  example,  a  specific  motion  clip  is  found  and  then  replayed.  Since  there  may  not  be 
many  clips  available  for  each  specific  motion,  it  will  become  apparent  that  the  generated  motions 
are  repetitive. 

Given  that  the  existing  animation  systems  still  lack  many  desirable  properties,  we  want  to 
achieve  the  following  goals  in  our  work: 

•  Generate  motions  that  make  the  characters  human-like  and  intelligent:  we  want  to  syn¬ 
thesize  navigation  motion  for  multiple  human-like  characters  avoiding  obstacles  and  each 
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other  in  large  and  dynamic  environments. 

•  Generate  these  motions  efficiently:  create  a  multiple-character  navigation  framework  where 
the  characters  can  interactively  react  to  user  changes  to  the  environment. 

•  Synthesize  variations  in  these  motions:  learn  a  variation  model  from  example  motion  data, 
and  be  able  to  synthesize  variations  of  the  data  that  retain  features  of  the  original  examples 
but  are  not  exact  copies  of  them. 

In  this  thesis,  we  describe  three  approaches  that  we  have  developed  to  achieve  these  goals. 
The  three  sections  in  this  chapter  briefly  summarize  each  of  these  approaches.  Our  main  contri¬ 
butions  are: 

•  A  planning  approach  that  applies  heuristic  search  methods  to  efficiently  generate  goal- 
driven  navigation  motion  for  virtual  human-like  characters.  Compared  to  methods  that  use 
large  data  sets  of  motion,  we  show  that  we  can  use  a  small  set  of  segmented  motion  clips  to 
generate  motions  for  a  large  number  of  characters  navigating  simultaneously  in  dynamic 
environments. 

•  A  novel  precomputation-based  approach  to  use  human  motion  data  to  generate  navigation 
motion:  we  first  precompute  a  search  tree  of  possible  motion  paths  with  the  data,  and  then 
use  a  backward  search  method  during  runtime  to  solve  planning  queries.  We  show  that 
our  approach  is  more  than  two  orders  of  magnitude  faster  than  traditional  forward  search 
methods  such  as  A*-search. 

•  We  present  a  technique  for  precomputing  large  diverse  trees,  and  explore  the  advantages 
and  disadvantages  of  our  method  compared  to  previous  methods  for  building  diverse  trees. 

•  We  study  the  problem  of  generating  variation  in  motion  data.  Instead  of  considering  vari¬ 
ation  as  an  additive  noise  component,  we  take  a  data-driven  approach  and  apply  learning 
techniques  to  this  problem.  We  show  that  we  can  use  Dynamic  Bayesian  Networks  to 
synthesize  an  unlimited  number  of  variants  automatically.  This  process  does  not  require 
manual  parameter  tuning  and  is  not  tedious  compared  to  the  major  previous  approach  of 
adding  noise. 

•  We  show  that  we  can  use  our  method  to  model  and  synthesize  variation  for  many  types  of 
human  motion  data.  Our  model  takes  a  small  number  of  input  motions,  and  synthesizes 
spatial  and  temporal  variants  that  retain  original  features  of  the  inputs  but  are  not  exact 
copies  of  them.  Our  approach  is  novel  in  that  there  is  no  previous  automated  method  that 
can  generate  such  variants  for  human  motion  data. 
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Figure  1.2:  Left:  A  simple  behavior  graph  of  high-level  motions  used  in  our  Behavior  Plan¬ 
ning  approach.  Right:  Planned  behaviors  for  100  animated  characters  navigating  in  a  complex 
environment. 


1.1  Behavior  Planning 

We  call  our  planning  approach  Behavior  Planning.  We  build  behavior-based  graphs  to  repre¬ 
sent  our  motions,  and  automatically  generate  sequences  of  motions  of  characters  navigating  in 
complex  and  dynamic  environments  Il53.  The  inputs  are  a  starting  location,  a  goal  location,  an 
environment  description,  and  a  behavior  graph.  The  algorithm  generates  a  sequence  of  motions 
that  allow  the  character  to  navigate  from  the  start  to  the  goal. 

This  approach  is  well  suited  for  creating  long  sequences  of  motions.  For  example,  if  a  char¬ 
acter  has  to  navigate  through  a  terrain  with  obstacles  it  has  to  jump  over  or  duck  under,  our 
approach  can  easily  create  a  sequence  of  actions  that  is  natural  and  makes  logical  sense.  We  can 
apply  our  technique  to  generate  the  motions  for  the  characters  in  crowds  and  games.  This  is  a 
difficult  problem  because  it  is  not  trivial  to  generate  natural  and  collision-free  motions  for  a  large 
number  of  characters  automatically  and  efficiently. 

Our  main  ideas  are  to  abstract  motions  as  high-level  actions,  build  a  behavior  graph  of  these 
actions,  and  perform  a  global  search  of  this  graph  to  find  a  solution.  Figure  |1.2|  (left)  shows  an 
example  of  a  behavior  graph.  Our  approach  can  generate  intuitive  sequences  of  actions  such  as: 
walk  to  the  fence,  duck  under  it,  jog  forward  towards  the  stream  and  then  jump  over  it.  We  show 
results  of  up  to  100  characters  navigating  in  large  and  complex  environments  (Figure  [T2] right) . 

Additional  value  over  previous  work:  We  view  this  approach  as  one  motion  planning 
method  among  a  spectrum  of  methods  (Figure  3.1).  Our  approach  differs  from  previous  methods 
in  that  it  has  a  carefully  chosen  planning  space.  Assumptions/Limitations:  We  assume  that  we 
are  given  a  set  of  segmented  and  blendable  motion  clips  as  input.  The  output  sequence  is  limited 
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to  be  a  concatenation  of  these  input  clips.  Insights:  The  abstraction  of  motions  as  high-level  be¬ 
haviors  and  the  carefully  chosen  planning  space  lead  to  both  the  strengths  and  weaknesses  of  the 
approach.  The  weakness  is  that  we  require  segmented  and  blendable  motion  clips  correspond¬ 
ing  to  the  high-level  behaviors.  The  strength  is  that  since  the  number  of  these  behaviors  and 
input  clips  are  small,  the  search  algorithm  is  efficient  and  works  well  for  generating  navigation 
motions.  In  addition,  we  empirically  found  that  using  an  extremely  small  data  set  is  enough  for 
generating  many  types  of  navigation  motions. 


1.2  Precomputed  Search  Trees 

Runtime  efficiency  is  particularly  important  for  games,  virtual  reality  applications,  and  interac¬ 
tive  simulations.  In  many  interactive  systems,  the  AI  that  controls  the  autonomous  characters  is 
often  limited  by  the  computation  time  that  is  available.  It  may  only  have  a  small  fraction  of  a 
second  to  decide  what  the  characters  should  do.  This  leads  to  algorithms  that  generate  simple 
scripted  behaviors;  more  complex  behaviors  are  not  possible  due  to  the  time  constraints.  More¬ 
over,  the  time  constraints  are  worse  if  there  are  a  large  number  of  characters.  This  motivates  our 
investigation  of  the  aspects  of  the  planning  and  motion  synthesis  processes  that  can  be  precom¬ 
puted  in  order  to  achieve  a  faster  runtime. 

We  argue  that  it  is  possible  to  tradeoff  memory  for  speed:  by  computing  and  storing  as  much 
information  as  possible  beforehand  to  allow  a  faster  runtime  search.  We  have  demonstrated  this 
concept  for  human  motion  data  [[53ll.  Our  method  allows  the  runtime  to  be  efficient  while  also 
keeping  the  memory  requirement  to  a  reasonable  amount. 

The  key  idea  is  to  precompute  search  trees  (Figure  [0]left)  of  motion  clips  that  can  be  applied 
to  arbitrary  environments.  Instead  of  solving  the  usual  planning  problem  with  one  start  position 
and  one  goal  position  for  each  character,  we  first  ignore  the  obstacles  and  goal  position  in  the 
environment.  We  precompute  a  tree  of  all  the  reachable  points  of  the  character  given  existing 
motion  clips.  We  then  use  this  tree  along  with  a  runtime  backward  search  method  to  solve 
planning  queries  for  any  configuration  of  obstacles  and  goal  position. 

For  distant  goal  positions,  we  first  use  a  fast  coarse-level  planner  to  generate  a  rough  path  of 
intermediate  sub-goals  to  guide  each  iteration  of  the  runtime  backward  search  phase.  While  the 
use  of  a  coarse-level  planner  for  handling  distant  goal  positions  is  one  possible  option,  it  may  not 
be  beneficial  in  some  cases.  For  example,  depending  on  the  discretization  of  the  coarse  planner, 
it  is  possible  that  it  will  return  no  solution  even  though  a  solution  exists.  We  therefore  explore 
methods  for  precomputing  larger  trees  so  that  it  is  not  necessary  to  use  the  coarse-level  planner. 

We  originally  built  exhaustive  trees  of  five  to  six  depth  levels.  We  can  use  these  trees  to  show 
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Figure  1 .3:  Left:  An  example  of  a  preeomputed  seareh  tree.  This  is  a  frequeney  plot;  eaeh  point 
represents  the  number  of  paths  that  ean  reaeh  that  point  from  the  root  of  the  tree.  The  root  is  near 
the  middle  of  the  figure  and  the  tree  progresses  in  a  forward  direetion  (or  up  in  the  figure).  This 
tree  has  about  220,000  nodes  and  requires  a  storage  memory  of  10  MB.  Right:  Sereenshot  of 
our  interaetive  system.  The  eharaeters  respond  to  user  ehanges  interaetively  while  navigating  in 
large  and  dynamie  environments. 


that  the  eoneept  of  preeomputation  works.  However,  the  path  lengths  in  these  trees  are  too  small 
and  we  eannot  use  them  to  solve  more  praetieal  planning  problems  that  require  solutions  of  up 
to  fifty  depth  levels. 

Sinee  the  exhaustive  tree  requires  memory  of  about  1  GB  to  store  the  preeomputed  informa¬ 
tion,  we  then  experimented  with  a  simple  method  to  seleet  subsets  of  paths  from  the  exhaustive 
tree.  We  eall  the  seleeted  subset  a  pruned  tree.  We  found  that  we  ean  use  about  10  MB  to  store 
the  pruned  tree,  and  we  ean  use  it  to  find  solution  paths  that  are  similar  to  the  ones  from  the 
exhaustive  tree. 

However,  the  pruned  trees  still  have  small  path  lengths.  This  motivates  us  to  build  larger  and 
more  general  trees.  In  addition,  we  want  to  build  trees  that  have  diverse  paths,  whieh  intuitively 
means  that  the  paths  should  evenly  eover  the  region  that  they  are  in.  The  purpose  of  having 
diverse  paths  is  to  allow  the  tree  to  handle  as  many  environments  (obstaele  eonfigurations  and 
goal  positions)  as  possible.  The  main  idea  is  to  use  a  greedy  and  randomized-based  method  to 
inerementally  piek  subpaths  to  add  to  the  tree,  starting  from  an  empty  tree.  We  use  this  method 
to  build  trees  of  up  to  fifty  depth  levels.  While  this  tree  building  method  is  simple  and  greedy, 
we  found  that  we  ean  use  it  to  sueeessfully  find  solution  paths  in  large  environments  with  many 
obstaeles.  We  also  show  the  advantages  and  disadvantages  of  our  large  and  diverse  tree  eompared 
to  previous  methods  for  building  similar  kinds  of  diverse  trees. 

We  demonstrate  the  effieieney  of  our  teehnique  aeross  a  range  of  examples  in  an  interaetive 
applieation  with  multiple  autonomous  eharaeters  navigating  in  dynamie  environments  (Figure 
|1.3|  right).  Eaeh  eharaeter  re-plans  in  real-time  aeeording  to  arbitrary  user  ehanges  to  the  en- 
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vironment  obstacles  or  navigation  goals.  We  empirieally  show  that  the  runtime  phase  of  our 
teehnique  is  more  than  two  orders  of  magnitude  faster  than  traditional  forward  seareh  methods; 
this  is  under  the  eondition  that  we  are  given  a  set  of  motion  elips  as  input,  and  the  output  is  a 
eoneatenation  of  any  sequenee  of  these  input  elips. 

Additional  value  over  previous  work:  We  show  a  complete  system  that  demonstrates  the 
eoneept  of  preeomputation  for  planning:  we  show  how  to  preeompute  diverse  seareh  trees;  we 
deseribe  an  effieient  runtime  baekward  seareh  method  for  solving  planning  queries;  we  use  these 
methods  in  aetual  planning  seenarios  and  show  runtime  results;  and  we  have  an  interaetive  sys¬ 
tem  with  many  eharaeters  navigating  in  eomplex  environments  using  our  approaeh.  Assump¬ 
tions/Limitations:  Similar  to  the  behavior  planning  approaeh,  we  assume  that  we  are  given  a 
set  of  blendable  and  segmented  motion  elips  as  inputs.  An  output  sequenee  is  limited  to  be  a 
eoneatenation  of  the  input  elips.  Insights:  First,  we  have  found  that  preeomputation  is  eertainly 
a  viable  approaeh  for  motion  planning.  However,  there  is  a  tradeoff  between  memory,  runtime 
speed,  and  optimality.  These  issues  should  be  eonsidered  before  ehoosing  between  preeompu¬ 
tation  and  traditional  seareh  methods.  Seeond,  we  show  that  preeomputing  diverse  trees  with 
randomized-based  approaehes  is  fast  and  simple,  but  they  work  surprisingly  well.  Our  trees  ean 
solve  more  randomly-generated  planning  queries  than  previous  methods  [ITOl  |30l  for  building 
diverse  trees. 


1.3  Modeling  Spatial  and  Temporal  Variants  in  Motion  Data 

When  a  person  performs  the  “same”  motion  more  than  onee,  eaeh  motion  will  be  performed  in  a 
slightly  different  manner.  This  is  an  important  part  of  ereating  realistie  motion  that  has  not  been 
fully  explored.  For  example,  typieal  erowd  animation  systems  f[66l  utilize  a  few  walking  motion 
elips  for  every  walking  eyele  and  every  eharaeter  of  the  simulation.  This  ean  lead  to  synthesized 
motions  that  look  unrealistie  due  to  the  exact  repetition  of  the  original  walk  eyeles.  Henee  a 
variation  model  that  ean  generate  even  slight  differenees  of  the  original  walk  eyeles  has  the 
potential  to  greatly  improve  the  naturalness  of  the  output  animations.  Current  games  and  films 
[[99ll  also  do  not  typieally  produee  human-like  variations  in  their  erowd  simulations.  In  these 
applieations,  as  soon  as  even  one  example  of  repetition  is  identified,  the  whole  animation  ean  be 
immediately  deemed  un-humanlike.  This  ean  make  games  and  films  less  fun  and  interesting. 

Previous  methods  eonsider  variation  to  be  an  additive  noise  eomponent,  whieh  is  not  robust 
for  automatieally  generating  animations.  While  there  are  previous  methods  for  adding  noise 
to  existing  motion  [[8l|75l,  there  is  no  guarantee  that  the  added  noise  will  mateh  well  with  the 
motion.  Adding  noise  is  an  ad-hoe  proeess  that  requires  manual  parameter  tuning. 
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Figure  1.4:  A  DBN  for  the  variables  Xi,  Eaeh  node  Xj  represents  one  DOF  in  the 

motion  data.  We  use  the  prior  network  to  model  the  first  2  frames  of  eaeh  input  motion  elip.  The 
transition  network  then  models  subsequent  frames  given  the  previous  2  frames.  We  assume  a 
2nd-order  Markov  property  beeause  it  is  the  simplest  model  that  works  well. 

We  believe  that  variation  should  not  be  just  an  additive  noise  eomponent.  Instead,  we  take 
a  data-driven  approaeh  to  this  problem.  Given  a  small  number  of  examples  of  a  partieular  type 
of  motion  (ie.  eheering,  walk  eyele,  swimming  breast  stroke)  as  input,  we  learn  a  model  from 
the  input  data,  and  use  this  model  to  synthesize  spatial  and  temporal  variants  of  that  motion. 
We  elaim  that  the  Dynamie  Bayesian  Network  (DBN)  [l24l  |27]I  model  solves  this  problem  well 
as  it  provides  a  formal  and  robust  approaeh  to  model  the  distribution  of  the  data.  A  Bayesian 
Network  (BN)  represents  a  joint  probability  distribution  of  a  set  of  related  variables.  A  DBN 
extends  this  distribution  for  temporal  proeesses:  it  represents  a  joint  probability  distribution  of  a 
set  of  related  and  time-dependent  variables.  In  the  ease  of  motion  data  (Figure [L4]),  the  variables 
are  the  degrees-of-freedom  (DOF)  in  the  data.  It  is  this  probability  distribution  from  whieh  we 
sample  to  synthesize  our  new  variants.  There  are  three  major  steps  for  learning  a  model  and 
synthesizing  new  variants.  First,  we  learn  the  strueture  of  the  DBN  using  the  input  examples. 
We  use  a  greedy  algorithm  based  on  a  variant  of  the  Bayesian  Information  Criterion  seore  to 
seleet  a  good  strueture.  Seeond,  we  use  the  learned  strueture  and  the  original  data  to  synthesize 
new  variants.  Third  and  optionally,  we  ean  use  an  inverse  kinematies  method  developed  within 
our  DBN  framework  to  satisfy  any  foot  and  hand  eonstraints. 

The  key  result  of  our  method  is  that  we  ean  take  a  few  examples  of  a  partieular  type  of 
motion  as  input,  and  produee  an  unlimited  number  of  spatial  and  temporal  variants  as  output. 
The  new  variants  are  statistically  and  visually  similar  to  the  inputs,  but  are  not  exact  copies. 
We  demonstrate  our  approaeh  with  a  variety  of  full-body  human  motion  data.  To  evaluate  our 
approaeh,  we  perform  a  user  study  to  show  that:  (i)  our  new  variants  are  just  as  natural  as  motion 
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capture  data,  and  (ii)  our  new  variants  are  less  repetitive  than  “Cycle  Animation”.  In  addition,  we 
demonstrate  that  “just  adding  noise”  to  existing  motion  can  create  poses  and  timings  that  look 
obviously  awkward.  We  show  this  with  two  methods  to  add  noise  to  motion:  (i)  a  naive/strawman 
method,  and  (ii)  the  Perlin  noise  function.  Finally,  it  is  useful  to  know  what  types  of  inputs  work 
well  with  our  approach.  Hence  we  provide  a  DBN-based  method  to  select  a  subset  of  examples 
(from  a  larger  set)  that  would  work  well  with  our  approach. 

Additional  value  over  previous  work:  The  area  of  synthesizing  motion  variation  is  largely 
unexplored.  We  provide  a  data-driven  approach  to  take  a  few  examples  of  a  particular  type  of 
motion,  and  generate  spatial  and  temporal  variants  of  them.  Assumptions/Limitations:  We 
assume  that  the  input  examples  come  from  a  particular  type  of  motion  (ie.  walk  cycles,  jump 
forward).  Our  current  approach  cannot  combine  different  types.  The  inputs  have  to  be  “similar”, 
but  we  specifically  describe  how  we  can  get  inputs  that  work  well  with  our  approach.  Since  we 
are  generating  variants  of  the  inputs,  the  outputs  can  sometimes  be  only  slightly  different  visually, 
althought  their  numeric  values  can  be  quite  different.  Insights:  It  is  possible  to  automatically 
generate  spatial  and  temporal  variants  from  a  small  amount  of  data.  We  think  of  our  work  as  one 
step  towards  the  problem  of  motion  variation;  we  believe  that  the  overall  problem  is  still  largely 
unexplored  and  there  is  a  lot  more  that  can  be  done. 
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Chapter  2 
Related  Work 


We  first  review  previous  work  in  the  area  of  erowd  animation.  We  then  diseuss  work  that  re¬ 
lates  to  each  major  section  of  this  thesis:  motion  planning  methods  for  generating  animations 
of  virtual  characters,  precomputation  methods  for  animations,  and  techniques  for  modeling  and 
synthesizing  variation  in  motion  data. 


2.1  Crowd  Animation 

There  has  been  much  work  that  focuses  on  generating  crowd  animations.  We  review  previous 
work  on  steering  approaches,  particle-based  approaches,  and  Al-based  methods. 

Steering  Approaches.  These  approaches  use  a  set  of  simple  rules  [|87l  to  move  each  agent. 
The  idea  is  that  a  combination  of  these  local  rules  applied  to  each  agent  can  lead  to  emergent 
behaviors  for  the  group  of  agents.  Since  this  method  requires  a  nearest-neighbor  computation 
for  each  agent,  the  basic  approach  has  a  quadratic  runtime  bottleneck.  Reynolds  uses  spatial 
hashing  to  solve  this  issue,  and  is  therefore  able  to  generate  an  interactive  flock  of  280  boids 
at  60  frames  per  second  [[83|.  Recently  he  has  extended  this  method  to  scale  to  even  larger 
numbers  by  incorporating  a  multi-processor  approach  [f86l.  This  method  can  generate  up  to 
15,000  individuals  at  60  frames  per  second. 

The  basic  steering  approach  pioneered  by  Reynolds  1^71  has  been  extended  for  many  differ¬ 
ent  scenarios.  Anderson  and  his  colleagues  [|3l  developed  a  method  to  make  it  easier  to  control 
the  final  motion.  Although  the  emergent  behavior  is  appealing,  it  is  sometimes  difficult  to  control 
each  agent’s  motion.  Their  method  can  be  used  to  allow  the  agents  to  meet  specific  position  con¬ 
straints  at  specified  times,  while  still  retaining  their  crowd-like  features.  Lai  and  his  colleagues 
[fSUl  took  a  set  of  simulated  motions  for  a  crowd  of  agents,  and  then  clustered  these  motions 
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into  different  groups  to  build  a  graph-based  strueture  of  these  motions. The  idea  is  that  they  ean 
then  “replay”  the  original  motions  in  different  ways  to  generate  motions  of  the  agents  to  follow  a 
user-drawn  eurve,  for  example.  This  also  allows  for  a  better  eontrol  of  the  motions  for  the  group 
of  agents.  Peleehano  and  her  eolleagues  [|73l  |74l  use  a  eombination  of  psyehologieal  and  geo- 
metrieal  rules  to  get  emergent  behaviors  for  multiple  agents.  They  are  able  to  generate  motions 
for  agents  in  a  line  formation  and  agents  pushing  eaeh  other  in  a  erowded  environment. 

There  exists  many  other  methods  to  generate  motions  for  a  large  number  of  2D  agents,  al¬ 
though  they  differ  from  the  basie  steering  approaeh  [[8711.  Blue  and  his  eolleagues  [jTl  use  a  small 
set  of  rules  to  generate  the  behaviors  of  pedestrians.  They  ean  aehieve  real-life  pedestrian  be¬ 
haviors  with  their  method.  Their  method  uses  a  eoarse  diseretization  of  the  2D  spaee:  while  this 
allows  for  a  simplified  model  to  study  the  behaviors  that  ean  be  aehieved,  it  is  unrealistie  for 
motion  synthesis  for  human-like  eharaeters.  Kamphuis  and  his  eolleagues  [|4^  foeuses  on  how 
to  generate  motions  for  individuals  to  stay  together  as  they  move  along  a  path.  They  ean  make 
guarantees  of  the  amount  of  dispersions  of  the  individuals  in  the  group.  The  idea  is  that  given  a 
user-speeified  path  or  a  pre-planned  path,  they  ean  generate  motions  for  the  individual  agents  to 
follow  the  path. 

All  of  the  above  methods  differ  from  our  teehniques  in  that  they  ean  work  well  for  generating 
motions  for  simple  “boid”-like  eharaeters  in  simple  environments.  Our  teehniques  ean  generate 
motions  for  human-like  eharaeters  in  large  and  dynamie  environments.  Steering  approaehes  use 
loeal  polieies  or  rules  that  are  typieally  very  fast  to  eompute,  but  they  often  fail  in  eomplieated 
maze-like  environments  with  loeal  minima.  Large  environments  where  indireet  paths  are  needed 
for  the  eharaeters  to  reaeh  a  far  away  position  ean  be  troublesome  for  steering  methods.  Henee 
steering  methods  work  well  in  environments  with  few  or  no  obstaeles.  The  strength  of  our  global 
planning  method  is  that  it  ean  handle  large  and  eomplex  environments.  It  is  also  diffieult  to 
adapt  these  methods  appropriately  for  more  eomplex  eharaeter  skeletons  sueh  as  human  figures. 
These  rule-based  methods  ean  generate  2D  or  3D  paths  for  “boid”-like  eharaeters  to  follow. 
If  we  generate  a  2D  path  for  a  human-like  eharaeter  to  follow,  this  may  eause  a  eharaeter  to 
turn  suddenly  to  avoid  something.  Sueh  sudden  turns  ean  result  in  jittery  motions.  Although 
more  sophistieated  methods  ean  potentially  be  used  to  handle  this  issue,  employing  them  will 
inerease  the  runtime  of  the  whole  approaeh  signifieantly,  espeeially  if  there  are  a  large  number 
of  eharaeters.  In  addition,  applying  simple  rules  eannot  guarantee  eollision  avoidanee  between 
multiple  eharaeters.  Our  planning  seheme  applied  to  individual  motion  elips  ean  handle  these 
issues. 

Particle-Based  Approaches.  These  methods  model  a  foree-based  system  affeeting  many 
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particles  (or  agents)  instead  of  eonsidering  the  motions  of  eaeh  agent  separately.  These  models 
often  assume  that  the  agents  are  partieles  of  gases  or  fluids.  Realistie  erowd  behaviors  ean  be 
generated  by  modeling  the  agents  this  way. 

Helbing  and  his  eolleagues  [l^[34ll  model  pedestrians  as  point  partieles.  They  use  a  foree- 
based  model  [|^  to  “move”  eaeh  partiele.  Eaeh  partiele  ean  be  affeeted  by  some  attraetive  or 
repulsive  forees  based  on  proximity  to  other  partieles  and  other  objeets  in  the  environment.  They 
ean  then  eompute  simulations  of  pedestrians  forming  lanes  with  other  pedestrians  walking  in 
the  same  direetion.  They  ean  also  generate  two  groups  of  pedestrians  moving  through  a  narrow 
spaee  in  opposing  direetions.  They  also  use  a  partiele-based  model  of  pedestrians  [[34l  to  study 
how  they  internet  in  panie  situations.  They  ean  then  use  their  model  to  reproduee  empirieal 
observations  of  real-life  eseape  panie  seenes.  The  signifieanee  of  their  work  is  to  study  how 
pedestrians  move  in  different  situations.  This  allows  them  to  propose  ways  to  design  struetures 
to  alleviate  the  danger  from  erowd  panie  situations.  Their  work  provides  an  empirieal  model  to 
study  erowds  of  people.  We  do  not  deal  with  erowd  behaviors  in  the  same  sense:  our  eharaeters 
are  eonsidered  as  individuals  and  there  is  no  explieit  interaetion  between  them. 

Treuille  and  his  eolleagues  H 10211  generate  erowd  motions  by  thinking  of  erowds  of  agents  as 
partieles  in  a  fluid.  They  model  a  potential  field  in  the  2D  environment,  and  use  this  potential 
field  to  move  all  the  eharaeters  at  the  same  time.  They  ean  generate  loeal  eollision  avoidanee 
between  the  eharaeters,  and  be  able  to  aehieve  some  niee  effeets  of  erowds  sueh  as  lane-forming. 

In  general,  the  above  partiele-based  approaehes  model  a  foree-based  system  to  “move”  all  the 
partieles  at  the  same  time.  This  is  in  eontrast  to  the  agent-based  methods  for  steering  approaehes. 
These  partieles-based  approaehes  work  well  for  generating  2D  paths  for  eharaeters  to  follow. 
Similar  to  steering  approaehes,  they  are  diffleult  to  extend  to  human-like  eharaeters.  However, 
they  ean  model  interaetions  between  partieles  well.  While  our  planning  approaehes  work  well 
for  human-like  eharaeters,  we  do  not  model  any  erowd-like  interaetions  between  them. 

AI-Based  Methods.  These  methods  use  artifieial  intelligenee  (AI)  to  autonomously  generate 
the  motions  for  eaeh  eharaeter.  Eaeh  eharaeter  may  have  a  eertain  emotional  state;  they  are  given 
goals  and  ean  interaet  with  others  just  like  a  real  human  erowd.  These  methods  ean  generate 
more  realistie  human-like  behaviors  eompared  to  steering  and  partiele-based  approaehes. 

Eunge  and  his  eolleagues  [l25l  deseribed  an  AI  paradigm  as  part  of  an  approaeh  to  eontrol  the 
animation  of  eharaeters  give  high-level  eontrols.  The  approaeh  has  the  same  disadvantage  as  the 
deliberative  AI  paradigm:  speeifie  behaviors  have  to  be  programmed  into  the  system,  and  it  is 
diffleult  to  extend  simple  pre-programmed  behaviors  to  truly  intelligent  eharaeters. 

There  are  eommereial  systems  lUl  |2l  that  use  AI  teehniques  to  generate  motions  for  large 
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crowds.  Al-Implant  O  can  be  used  to  simulate  crowd  and  vehicle  behaviors  for  games  and 
training  simulations.  One  of  their  advertised  strengths  is  the  ability  to  have  the  characters  adapt  to 
changing  environments  so  that  game  programmers  do  not  have  to  tediously  script  every  scenario. 
Massive  [i2||  is  a  computer  animation  package  for  crowds.  It  was  originally  built  for  generating 
the  battling  scenes  in  the  Lord  of  the  Rings  films.  It  can  be  used  to  quickly  create  a  large  number 
of  individual  agents;  AI  techniques  are  then  used  to  control  each  character  autonomously. 

Shao  and  Terzopoulos  [|^  has  developed  a  system  of  virtual  humans  behaving  automatically 
in  a  virtual  train  station.  Their  system  combines  goal-directed  behaviors,  reactive  behaviors, 
navigation  rules,  and  perception  capabilities  to  generate  the  motions  for  each  character.  However, 
many  of  the  rules  or  actions  for  the  characters  have  to  be  manually  defined,  similar  to  the  way 
a  large-scale  AI  system  for  crowds  are  built.  Yu  and  Terzopoulos  HI  1211  further  incorporates  the 
use  of  decision  networks  from  the  learning  community  to  reason  about  how  the  characters  should 
move  in  an  uncertain  world. 

Our  planning  method  is  similar  to  the  above  Al-based  approaches  in  that  we  also  have  to 
pre-define  the  specific  behaviors  that  our  characters  can  perform.  Given  a  small  number  of  short 
motion  clips,  the  idea  is  to  globally  plan  for  a  sequence  of  them  when  needed  during  runtime,  so 
that  we  do  not  program  or  script  every  scenario  as  many  games  do.  We  show  that  we  can  take  a 
small  number  of  motion  clips,  and  apply  a  variety  of  planning  techniques  to  generate  some  nice 
behaviors  for  many  characters.  We  do  not  build  an  Al-system  of  the  characters:  we  only  focus 
on  the  planning  and  navigation  of  the  characters’  motions. 


2.2  Motion  Planning  Methods  for  Animation 

We  start  by  reviewing  the  simplest  approach  for  planning  a  character’s  motion  and  then  progres¬ 
sively  discuss  more  complicated  planning  techniques.  We  then  discuss  work  on  motion  graphs 
and  move  trees. 

Planning  Approaches.  There  has  been  much  work  that  uses  a  planning  approach  for  syn¬ 
thesizing  animations.  The  simplest  approach  is  to  take  a  2D  top  view  representation  of  the 
environment,  and  plan  for  a  2D  path  that  travels  from  the  start  position  to  the  goal  position  while 
avoiding  the  obstacles.  The  character  is  bounded  by  a  cylinder  for  the  purpose  of  collision  de¬ 
tection.  The  obstacles  are  then  enlarged  by  the  radius  of  this  cylinder  (or  circle  in  the  top  view), 
and  the  character  is  then  represented  by  a  point.  The  problem  then  becomes  finding  a  valid  path 
for  this  point  [16^110411. 

Kuffner  [|49ll  uses  this  approach  to  synthesize  motions  for  a  character  navigating  in  a  maze- 


14 


like  environment.  A  proportional  derivative  eontroller  is  then  used  to  allow  the  eharaeter  to 
follow  the  path.  Cyeles  of  walking  motions  are  played  baek  in  order  to  simulate  the  aetual 
motions  of  the  eharaeter.  Pettre  and  his  eolleagues  [TTTI  also  use  this  approaeh  to  generate  a  valid 
path.  However,  they  use  a  probabilistie  roadmap  iHTl  teehnique  to  seareh  for  this  path.  Their 
path  is  built  to  be  a  eomposition  of  Bezier  eurves.  A  eontroller  is  then  used  to  blend  existing 
data  to  generate  the  walking  eyeles.  A  warping  proeedure  lfT3l  1 10911  is  applied  to  the  arms  for 
avoiding  obstaeles,  and  to  allow  for  more  detailed  “reaehing  through  a  fenee”  motion. 

Esteves  and  her  eolleagues  [l2T]l  also  uses  this  2D  view  of  the  eharaeter.  Their  system,  how¬ 
ever,  is  more  eomplex  beeause  it  allows  for  multiple  eharaeters  to  perform  eooperation  tasks. 
They  first  plan  for  paths  for  their  eharaeters  and  an  objeet  being  manipulated  with  a  roadmap 
teehnique;  these  paths  are  then  eonverted  into  trajeetories.  A  loeomotion  eontroller  is  used 
to  synthesize  the  motions  of  the  feet,  and  inverse  kinematies  (IK)  is  used  to  synthesize  the  mo¬ 
tions  of  the  arms.  The  strength  of  their  work  is  in  the  eombination  of  these  teehniques  to  generate 
eooperation  tasks  for  multiple  eharaeters. 

An  important  differenee  between  these  methods  and  our  Behavior  Planning  method  [|52l  is 
that  we  do  not  first  plan  a  2D  path  in  the  environment  and  then  make  the  eharaeters  follow  the 
path.  The  aetions  of  the  planning  algorithm  are  motion  elips  that  eorrespond  to  behaviors  like 
jumping  or  jogging  forward.  So  we  plan  for  sequenees  of  these  behaviors  for  the  eharaeter  to 
reaeh  a  desired  goal  position.  This  has  the  advantage  that  the  synthesized  motions  will  already 
be  natural  by  eonstruetion.  Most  erowd  and  game  systems  that  have  their  eharaeters  follow  a  2D 
path  often  eontain  unnatural  jittery  motions. 

Manipulation  planning  allows  these  virtual  eharaeters  to  interaet  with  the  environment.  The 
key  idea  is  to  separate  a  manipulation  task  into  transit  and  transfer  paths  Il44l .  Transit  paths  allow 
the  arms  to  move  to  and  from  the  objeet  being  manipulated.  Transfer  paths  generate  the  objeet 
movements  and  the  eorresponding  arm  motions.  The  movement  in  the  six  DOF  spaee  (three  for 
translation  and  three  for  rotation)  of  the  objeet  is  first  planned.  An  IK  algorithm  is  then  used  to 
allow  the  arms  to  grasp  this  objeet  during  the  transfer  path.  Yamane  and  his  eolleagues  UllOII 
also  uses  this  approaeh  to  generate  animations  of  manipulation  tasks.  They  use  a  natural  veloeity 
profile  to  eonvert  paths  to  motion  trajeetories.  Liu  and  Badler  [|6^  ereate  reaehing  motions 
similarly  by  first  searehing  for  a  path  for  the  end-effeetor  (ie.  the  hand),  and  then  using  IK  to 
fit  the  rest  of  the  body  to  the  positions  of  the  end-effeetor.  They  use  the  depth  buffer  to  perform 
more  effieient  eollision  deteetion.  Our  method  does  not  generate  any  eharaeter  interaetions  with 
objeets  in  the  environment.  We  foeus  only  on  navigation  methods  for  the  eharaeters  to  avoid 
obstaeles  in  large  and  dynamie  environments. 

Roadmap-based  approaehes  apply  Probabilistie  Roadmap  (PRM)  teehniques  iHTl  to  anima- 
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tion.  Roadmaps  were  originally  developed  in  roboties  to  handle  situations  that  have  a  large 
number  of  DOFs.  It  is  also  well  suited  for  problems  where  multiple  queries  or  solutions  are 
required.  For  animation,  roadmaps  are  used  in  a  similar  way  in  that  we  first  build  a  graph  of 
where  the  eharaeters  ean  move  in  the  environment.  However,  this  graph  is  often  eonstrueted  in 
the  spaee  of  the  environment  that  the  eharaeters  move  in,  as  opposed  to  the  eonfiguration  spaee  of 
the  objeets  being  moved  in  roboties.  This  graph  or  roadmap  is  often  built  in  a  lower  dimensional 
spaee  based  on  human  knowledge  about  how  the  eharaeters  ean  navigate. 

Salomon  and  his  eolleagues  [[89l  first  build  a  visibility-based  PRM  [l94ll  that  eovers  the  reaeh- 
able  spaee  of  a  building.  They  then  use  this  visible-PRM  to  find  a  valid  path  between  start  and 
goal  loeations.  The  user  ean  also  loeally  steer  these  eharaeters  along  the  generated  path.  Choi 
and  his  eolleagues  ffT^  use  a  roadmap-based  approaeh  to  first  eonstruet  a  graph  of  possible  foot 
plaeements  and  transitions  in  the  environment.  The  graph  is  searehed  for  a  solution  path  based  on 
the  given  start  and  goal  foot  plaeements.  Motion  eaptured  data  is  then  adapted  to  fit  this  sequenee 
of  foot  eonstraints  to  synthesize  motion.  Bayazit  and  his  eolleagues  (bl  build  a  roadmap  of  how 
groups  of  boids  ifSTll  ean  navigate  in  the  environment.  Eaeh  boid  ean  use  the  same  roadmap. 
Eaeh  boid  ean  also  update  the  weights  of  the  edges  dynamieally  to  generate,  for  example,  a  goal- 
searehing  behavior  for  the  group.  Pettre  and  his  eolleagues  ITT^  use  a  similar  approaeh  to  build 
a  navigation  graph  to  animate  erowds  of  eharaeters. 

As  a  more  general  approaeh  to  Bayazit  et  al.’s  work  [[^,  we  ean  have  roadmaps  where  the 
nodes  and/or  edges  are  reaetive  to  ehanges  in  the  environment.  The  Elastic  Bands  method  [Quin¬ 
lan  and  Khatib  [83  allows  paths  to  deform  so  that  these  paths  ean  avoid  obstaeles  dynamieally. 
Elastic  Strips  [Broek  and  Khatib  [13  allows  the  eomputation  of  the  paths  to  be  done  in  the 
workspaee  rather  than  the  eonfiguration  spaee,  whieh  leads  to  a  faster  runtime.  Elastic  Roadmaps 
[Yang  and  Broek  [TTT]|  allows  the  roadmap  itself  to  adapt.  The  milestones  of  the  map  ean  move 
and  the  map  ean  be  eontinuously  updated.  In  addition.  Reactive  Deforming  Roadmaps  [Gayle 
etal.[26l  or  Adaptive  Elastic  Roadmaps  [Sud  etal.[93  model  the  nodes  and  edges  of  the  roadmap 
as  a  physieal  system.  These  nodes  and  edges  ean  be  removed  and  added  adaptively  as  the  agents 
and  obstaeles  move  in  the  environment.  In  general,  roadmap-based  approaehes  are  usually  used 
to  first  build  a  strueture  that  speeifies  the  reaehable  spaee  of  the  environment. 

RRT-based  approaehes  apply  Rapidly-Exploring  Random  Tree  teehniques  (541  [551  to  ani¬ 
mation.  RRTs  were  originally  developed  in  roboties  to  handle  problems  with  nonholonomie 
eonstraints  and  many  DOEs.  Eor  animation,  they  are  used  to  effieiently  sample  spaees  with 
high  DOEs.  They  work  well  for  generating  arm  motions,  for  example,  while  doing  3D  eollision 
avoidanee.  However,  there  is  no  guarantee  that  the  motions  are  natural. 

An  RRT  algorithm  [l47l  ean  be  used  to  generate  the  motions  of  the  manipulated  objeet  HIIOH. 
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In  this  case,  the  human  motions  are  synthesized  by  using  IK  to  fit  the  eharaeter’s  pose  to  the 
objeet.  Kallmann  and  his  eolleagues  [[3^  used  RRTs  to  generate  reaehing  motions.  They  used 
the  idea  of  RRTs  to  build  and  seareh  a  roadmap  of  possible  eonfigurations  in  a  pre-defined  22- 
DOF  spaee.  Their  teehnique,  however,  requires  a  lot  of  manual  proeessing  to  define  this  spaee. 
They  also  take  a  straight  line  in  this  22-DOF  spaee  to  be  “smooth”;  henee  the  motions  may  not  be 
natural.  Applying  planning  methods  in  sueh  a  large  spaee  is  usually  not  a  good  idea  as  the  runtime 
of  these  algorithms  will  inerease  signifieantly.  In  adddition,  planning  on  the  individual  joints  of  a 
human-like  skeleton  does  not  guarantee  natural  motions.  Planning  on  all  of  the  individual  joints 
of  a  human-like  skeleton  is  the  other  extreme  of  the  simplest  ease  of  planning  first  in  the  2D  (top 
view)  environment  spaee.  Our  behavior  planning  method  avoids  both  of  these  extremes  to  plan 
for  natural  sequenees  of  motions  effieiently. 

There  are  systems  that  are  explieitly  designed  to  handle  human-like  figures  with  many  DOFs. 
Donteheva  and  her  eolleagues  ifTTl  presented  a  layered  approaeh  to  aeting  out  motions.  Eaeh 
layer  speeifies  the  motions  of  some  of  the  DOFs.  For  example,  the  motions  of  a  kangaroo  were 
ereated  with  six  layers.  The  first  layer  speeified  the  large-seale  movement  of  the  kangaroo:  its 
motion  trajeetory.  The  seeond  layer  added  details  to  the  motions  of  the  legs.  The  other  layers 
added  more  details  to  the  torso,  head,  arms,  and  tail.  Ching  and  Badler  [fTSll  ereated  motions 
for  a  human-like  figure  by  splitting  the  DOFs  into  parts,  and  dealing  with  eaeh  part  sequen¬ 
tially  given  the  motions  of  the  previously  generated  parts.  Broek  and  Kavraki  [fTTll  proposed  a 
deeomposition-based  approaeh  for  robot  manipulators.  They  solve  for  a  “tunnel”  that  is  a  low 
dimensional  spaee  that  ineludes  potential  solution  paths.  They  then  solve  for  the  final  solution 
by  searehing  in  this  tunnel.  The  overall  idea  is  to  split  up  the  original  problem  into  simpler 
subproblems,  whieh  results  in  a  reduetion  of  the  problem  eomplexity. 

Motion  Graphs  and  Move  Trees.  Our  work  is  elosely  related  to  teehniques  [jH  |28l  [451  |59l  |80l 
that  build  graph-like  data  struetures  of  motions.  These  approaehes  faeilitate  the  re-use  of  large 
amounts  of  motion  eapture  data  by  automating  the  proeess  of  building  a  graph  of  motion.  Our 
behavior  planning  approaeh  li52l  is  similar  in  that  it  also  builds  graphs  of  motion  data.  In  our 
ease,  we  abstraet  our  data  into  high-level  behaviors.  This  representation  offers  a  number  of 
advantages:  it  ean  generate  intuitive  sequenees  of  motions  aeeording  to  the  high-level  behaviors 
and  it  needs  only  a  small  amount  of  data  to  generate  interesting  motions. 

Sung  and  his  eolleagues  [|98l  use  the  idea  of  motion  graphs  to  generate  the  motions  for  many 
eharaeters.  The  key  differenee  of  their  work  is  that  they  ean  allow  the  eharaeters’  motions  to 
satisfy  position,  orientation,  and/or  timing  eonstraints.  This  allows  for  more  flexibility  rather 
than  just  re-playing  the  existing  motion  elips  as  in  the  motion  graph  approaeh. 
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Similar  to  crowd  systems,  interactive  applications  such  as  games  often  use  local  policy  meth¬ 
ods  to  generate  motions  for  its  characters.  Hence  these  methods  have  many  of  the  same  disad¬ 
vantages  as  crowd  systems  discussed  before:  they  have  problems  with  local  minima,  and  they 
do  not  work  as  well  for  human-like  characters  since  they  can  generate  only  2D  or  3D  paths.  In 
our  behavior  planning  method,  the  representation  of  motion  data  as  high-level  behaviors  is  rem¬ 
iniscent  of  move  trees  [|6^[69l.  which  are  often  used  in  games  to  represent  the  motions  available 
for  a  character.  However,  our  method  applies  a  global  planning  technique.  It  can  output  motions 
that  avoid  local  minima,  and  generate  motions  for  characters  navigating  in  a  large  and  dynamic 
environment. 

Behavior  Planning:  Additional  value  over  previous  work.  The  main  difference  between 
these  previous  methods  and  our  Behavior  Planning  method  [l52ll  is  that  we  apply  a  global  plan¬ 
ning  approach  in  a  carefully  chosen  action  space.  The  actions  are  high-level  behaviors  such  as 
jogging  forward  and  jumping:  this  means  that  we  are  planning  for  sequences  of  these  behav¬ 
iors.  We  do  not  plan  in  the  2D  or  top  view  of  the  environment  first,  and  we  do  not  plan  in  the 
combined  joint  angle  space  of  all  the  joints  of  a  human  skeleton.  Instead  our  planning  space  lies 
somewhere  in  between  these  two  extremes,  and  we  view  our  planning  approach  to  be  one  method 
among  a  spectrum  of  methods.  The  carefully  chosen  action  space  allows  our  method  to  be  fast 
and  to  require  only  a  small  memory  requirement.  Our  results  show  that  our  method  works  well 
for  generating  navigation  motions:  (i)  we  can  use  a  small  set  of  short  motion  clips  to  generate  a 
large  variety  of  motions  for  characters  navigating  in  large  environments,  and  (ii)  we  can  apply  a 
collection  of  existing  planning  methods  to  the  basic  setup  to  handle  more  complicated  scenarios 
such  as  dynamic  obstacles  and  multiple  character  motion  generation. 


2.3  Precomputation  Methods  for  Animation 

The  idea  of  precomputation  has  been  used  for  animation,  both  for  evaluation  and  efficient  gener¬ 
ation  of  motions.  We  discuss  these  methods,  and  identify  the  differences  between  them  and  our 
Precomputed  Search  Trees  f[53l  approach.  There  has  been  recent  work  in  the  robotics  community 
on  achieving  path  diversity  among  a  precomputed  set  of  paths.  We  then  discuss  the  similarities 
and  differences  of  these  methods  compared  to  ours. 

Reitsma  and  Pollard  [|83l  [84ll  introduced  the  idea  of  embedding  a  motion  graph  into  a  4-D 
grid.  The  4-D  grid  then  represents  the  possible  ways  the  character  can  move  in  the  environment. 
The  embedding  works  for  a  specific  static  environment.  This  is  not  a  problem  for  their  work 
because  their  focus  was  on  the  evaluation  of  the  character’s  moving  capabilities  given  a  set  of 
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motions.  In  contrast,  we  are  eoneerned  with  improving  the  runtime  speed  of  our  search  tech¬ 
nique.  Their  embedding  of  motion  data  and  our  preeomputed  trees  data  strueture  are  similar  in 
that  both  represent  the  possible  paths  that  the  eharacter  can  move.  Both  of  them  can  be  used  to 
synthesize  motions  for  a  eharacter  navigating  in  the  environment.  On  the  other  hand,  one  im¬ 
portant  differenee  between  their  method  and  ours  is  that  they  take  the  environment  into  aecount 
when  they  build  their  data  strueture.  They  also  build  paths  for  all  potential  start  positions  and 
all  goal  positions.  In  our  ease,  we  build  a  tree  that  represents  where  the  eharacter  can  go  given 
its  eurrent  position.  This  eorresponds  to  having  one  start  position  (for  the  root  of  the  tree)  and 
all  the  possible  goal  positions.  Most  importantly,  we  do  not  take  the  environment  into  account 
when  we  build  this  tree.  This  allows  us  to  reuse  the  tree  in  different  parts  of  a  large  environment. 
We  ean  also  deal  with  dynamie  environments  as  we  can  map  the  obstacles  to  the  tree  to  deal 
with  collision  detection  when  needed.  Moreover,  we  ean  use  the  same  tree  for  all  the  characters. 
Thus  our  method  is  partieularly  good  for  effieient  pathfinding  for  many  eharaeters  in  a  large  and 
eontinuously  ehanging  environment. 

Some  of  the  findings  in  [l83l  [84l  motivated  the  design  of  the  motion  elips  we  used.  They 
found  that  an  extremely  small  dataset  was  capable  of  produeing  good  behavior,  and  that  adding 
duplieate  motions  provided  a  relatively  small  improvement  in  path  quality  and  coverage.  Hence 
we  were  able  to  use  a  rather  small  set  of  motion  clips.  We  also  found  that  we  were  able  to  get  a 
large  eombination  of  overall  motions  from  a  small  set  of  motion  clips,  given  that  there  is  some 
variety  in  these  elips  (a  few  different  types  of  turning  and  a  few  slightly  different  jog  forward 
elips,  for  example)  and  that  they  are  relatively  short  eompared  to  the  size  of  the  environment. 
Therefore,  we  do  not  find  sealability  to  be  an  issue  in  our  framework:  a  large  amount  of  data  is 
not  needed. 

Sukthankar  and  her  eolleagues  [|97l  also  use  the  idea  of  un-rolling  a  motion  graph  to  cover 
an  environment.  They  find  the  paths  and  eosts  for  the  eharacter  to  reach  each  point  in  a  2D  grid. 
Their  embedding  also  takes  obstacles  into  aceount.  This  is  again  different  from  our  work.  Our 
focus  is  on  designing  a  system  that  allows  for  efficient  runtime  searehes,  while  being  able  to 
handle  dynamic  obstacles  and  multiple  eharaeters. 

Lee  and  Lee  [|58ll  have  also  explored  the  idea  of  preeomputation.  They  preprocess  sets  of 
motion  data  to  eompute  a  control  policy  for  a  eharaeter’s  aetions.  This  poliey  was  then  used 
at  runtime  to  efficiently  animate  boxing  characters.  There  are  also  existing  approaehes  that  use 
reinforcement  learning  to  compute  eontrol  polieies  for  generating  a  character’s  motions.  Ike- 
moto  and  her  eolleagues  llTTll  generate  motion  for  autonomous  agents  with  a  controller  derived 
from  reinforcement  learning  methods.  Treuille  and  his  eolleagues  [I103II  compute  near-optimal 
eontrollers  to  interactively  eontrol  the  motions  of  human-like  eharaeters.  MeCann  and  Pollard 
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[|^  also  compute  controllers  based  on  reinforcement  learning.  They  choose  the  character’s  next 
fragment  of  motion  based  on  the  current  player  input  and  the  previous  motion  fragment.  Our 
approach  is  different  from  these  control  policy  based  methods  in  that  we  precompute  a  search 
tree  of  all  possible  future  actions  given  some  existing  motions.  We  then  build  gridmaps  over  this 
tree  so  that  we  can  efficiently  index  to  the  relevant  portions  of  the  tree  during  runtime. 

Go  and  his  colleagues  [|29l  used  the  idea  of  precomputation  to  create  animations  of  vehicles. 
Their  method  allows  automatic  steering  of  3D  points  by  selecting  the  best  trajectory  among  a  set 
of  precomputed  ones.  Our  work  is  similar  in  that  we  precompute  potential  character  trajectories 
to  speedup  the  runtime  search.  Their  method,  however,  works  well  for  producing  3D  paths 
for  boids,  birds,  or  spacecrafts.  The  kinds  of  paths  that  they  produce  can  also  be  created  from 
steering  approaches  f[87l  for  crowds.  Since  these  boid-like  characters  can  move  around  with 
great  flexibility  in  the  3D  space,  they  cannot  exploit  as  much  the  strengths  of  precomputation. 
For  the  case  of  virtual  humans,  there  is  much  less  flexibility  in  the  ways  that  they  can  move.  It 
is  common  for  game  and  crowd  systems  to  use  specific  motion  clips  to  animate  their  characters. 
Hence  our  method  can  exploit  the  strengths  of  precomputing  these  potential  paths  for  efficient 
runtime  search  later. 

There  are  many  approaches  that  first  computes  some  kind  of  map  of  the  environment  before 
the  actual  motion  is  generated.  Sung  and  his  colleagues  [l98ll  use  a  two-level  planning  technique 
to  animate  their  characters:  a  probabilistic  roadmap  approach  llTll  first  generates  approximate 
motion  paths;  these  paths  are  then  refined  to  precisely  satisfy  certain  constraints.  Sud  and  his 
colleagues  [|95l  first  compute  a  multi-agent  navigation  graph  based  on  the  configurations  of  the 
agents  and  obstacles.  The  first  and  second  order  voronoi  diagrams  of  these  configurations  are 
used  to  compute  the  navigation  graph.  This  graph  can  then  be  used  to  create  a  maximum  clear¬ 
ance  path  for  each  character,  van  den  Berg  and  his  colleagues  [I106II  first  compute  a  global 
roadmap  of  the  environment  to  connect  the  reachable  parts  of  the  space.  They  then  use  a  recip¬ 
rocal  velocity  obstacles  method  [|105ll  to  generate  smooth  navigation  paths  for  each  character. 
The  work  in  H 10611  has  the  same  high-level  goals  of  our  precomputed  trees  work.  In  their  work, 
they  can  create  the  motions  of  thousands  of  2D  agents.  We  generate  the  motions  of  up  to  150 
human-like  characters.  Their  work  is  well  suited  for  generating  paths  for  2D  agents  while  our 
method  can  be  used  to  generate  natural  motions  for  human-like  characters. 

We  now  discuss  previous  work  related  to  the  issue  of  path  diversity.  The  motivation  for  study¬ 
ing  this  issue  is  that  while  our  original  system  Il53l  demonstrates  the  concept  of  precomputation, 
it  does  not  build  scalable  and  diverse  trees  that  can  be  used  in  more  general  planning  scenarios. 
This  is  also  true  for  many  previous  methods  [fTO  [T^  [30l  l43ll.  where  the  number  and  length  of 
paths  built  are  too  small  to  use  for  general  planning  problems. 
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Our  original  method  [l53ll  builds  trees  that  have  a  limited  depth  level.  We  showed  that  the 
eoneept  of  preeomputation  ean  lead  to  a  faster  runtime,  but  we  showed  these  results  either  for 
problems  with  very  small  environments  or  problems  requiring  a  two-level  hierarehieal  approaeh. 
Using  a  two-level  approaeh  made  it  diffieult  to  eompare  the  advantages  and  disadvantages  of 
the  preeomputation  method  against  A*-searoh  methods.  We  were  unable  to  make  this  eompar- 
ison  fairly  in  the  original  system  beeause  it  is  diffieult  to  build  large  trees  effeetively.  While 
there  are  eases  when  it  is  benefieial  to  eombine  two-level  approaehes  with  preeomputed  seareh 
trees,  building  preeomputed  trees  of  at  least  a  reasonable  size  and  eomparing  them  to  traditional 
forward  seareh  methods  are  still  important  issues. 

Green  and  Kelly  [[^  and  Branieky  et  al.  IfTOl  deseribe  methods  to  take  an  existing  set  of 
paths  in  a  tree  and  seleet  a  smaller  set  from  it  that  is  as  diverse  as  possible.  These  methods 
require  at  least  quadratie  time  with  respeet  to  the  number  of  paths;  henee  they  ean  only  be  used 
for  small  path  sets.  Furthermore,  they  both  require  the  existenee  of  a  set  of  paths  from  whieh  to 
seleet  from.  Generating  the  exhaustive  set  of  paths  to  seleet  from  only  works  for  trees  with  small 
depth  levels,  and  this  is  in  faet  the  approaeh  that  we  take  in  our  experiments.  For  larger  trees, 
generating  an  exhaustive  set  requires  exponential  time  with  respeet  to  the  depth  level,  and  it  is 
not  elear  how  we  ean  generate  a  subset  of  paths  from  whieh  to  begin  seleeting  from.  Indeed,  our 
randomized-based  tree  preeomputation  method  solves  this  problem:  how  to  generate  sueh  a  set 
of  paths  for  trees  of  large  depths  while  keeping  the  paths  diversified  enough  that  the  trees  ean  be 
used  to  solve  as  many  planning  queries  as  possible.  We  show  empirieal  results  eomparing:  our 
original  method  [|5^.  our  randomized-based  method  for  eomputing  large  and  diverse  trees,  and 
three  other  methods  from  previous  work  llTOll^. 

Eriekson  and  La  Valle  |fT9l  also  explore  the  idea  of  path  diversity.  They  introduee  a  surviv¬ 
ability  eriteria  that  ean  deerease  the  likelihood  that  numerous  paths  will  be  obstrueted  by  the 
same  obstaele.  Given  a  larger  eolleetion  of  paths  to  begin  with,  they  use  this  eriteria  to  seleet 
a  smaller  subset  of  paths.  Knepper  and  Mason  explore  the  similar  idea  of  taking  a  set  of 
paths  and  seleeting  one  of  them  for  exeeution.  Their  foeus  is  on  eomparing  between  the  statie 
and  dynamie  planning  eases  for  this  idea  of  seleeting  one  path  among  many  to  exeeute.  Both  of 
these  reeent  works  [fT9l  |43]|  eonsider  path  sets  that  are  very  small.  This  is  an  important  limitation 
beeause  their  small  path  sets  ean  work  in  environments  that  are  small  and  relatively  uneluttered. 
Given  an  environment  with  many  obstaeles,  their  path  sets  will  not  return  a  solution  even  though 
one  exists.  While  their  work  foeuses  on  the  analysis  of  the  path  sets,  it  is  diffieult  to  use  them  in 
real  planning  seenarios  of  large  environments  eluttered  with  many  obstaeles. 

Our  method  for  building  large  and  diverse  trees  is  related  to  sampling-based  planning  ap¬ 
proaehes.  In  our  algorithm,  we  ehoose  nodes  from  whieh  to  expand  from  in  the  same  way  as 
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Rapidly-Exploring  Random  Trees  (RRTs)  Il57ll.  We  choose  nodes  this  way  for  the  same  reason 
as  RRTs  do:  so  that  the  selected  nodes  will  be  evenly  spaced  and  not  biased  towards  a  particular 
region.  Our  algorithm  differs  from  RRTs  as  we  use  a  metric  to  locally  pick  paths  that  are  as 
evenly  spread  out  as  possible;  this  process  does  not  take  the  obstacles  into  account  as  the  tree 
is  being  precomputed.  Probabilistic  Roadmaps  (PRMs)  WBi  are  effective  for  planning  in  high¬ 
dimensional  spaces.  They  first  build  a  roadmap  for  a  given  environment  and  then  use  it  to  find 
solution  paths.  Our  method  also  has  a  preprocessing  phase,  but  our  precomputed  tree  can  then  be 
used  during  runtime  for  any  obstacles  and  any  start/goal  queries.  An  extended  version  of  PRMs 
[[60ll  first  builds  a  tree  without  taking  obstacles  into  account  and  later  map  them  back  into  the 
environment.  The  difference  in  our  case  is  that  since  we  have  a  set  of  actions  as  input,  we  first 
build  the  tree  in  the  action  space.  This  is  more  general  because  each  path  of  the  tree  can  later 
fit  anywhere  in  the  environment.  Finally,  the  key  difference  between  our  method  and  RRTs  and 
PRMs  is  the  overall  precomputation  concept  to  first  precompute  a  tree  and  then  use  a  runtime 
backward  search  to  find  a  solution. 


Precomputed  Search  Trees:  Additional  value  over  previous  work.  Precomputation  Con¬ 
cept:  Our  approach  differs  from  the  standard  planning  approach  and  the  previous  precomputation 
methods.  The  standard  planning  approach  takes  one  starting  location  and  one  goal  location  as 
input,  and  generates  a  solution  from  the  start  to  the  goal.  It  builds  one  search  tree  for  each  query. 
The  main  idea  of  our  approach  is  to  precompute  a  tree  of  potential  paths.  This  tree  can  be  used 
for  any  configurations  of  start  and  goal  locations.  With  respect  to  the  previous  precomputation 
methods,  it  is  common  to  first  compute  some  kind  of  map  of  the  environment  before  motions  are 
generated.  The  main  disadvantage  of  this  is  that  if  the  environment  changes,  the  map  has  to  be 
recomputed  every  time.  The  key  to  our  approach  is  that  we  first  compute  a  large  and  diverse  tree 
of  motion  paths  given  a  set  of  human  motion  data.  We  empirically  show  that  we  can  re-use  the 
same  tree  for  a  large  number  of  obstacle  configurations  and  goal  locations.  We  can  also  re-use 
the  same  tree  for  different  parts  of  the  environment  even  if  it  is  large.  As  a  result,  we  found  that 
by  doing  this  precomputation,  we  can  generate  the  motions  for  a  large  number  of  human-like 
characters  interactively.  Path  Diversity:  While  there  has  been  much  recent  work  on  studying 
path  diversity,  the  path  sets  that  these  works  analyze  are  very  small  and  are  difficult  for  use  in 
actual  planning  scenarios.  We  describe  a  simple  but  effective  randomized-based  method  to  pre¬ 
compute  large  and  diverse  trees.  We  provide  empirical  results  to  compare  our  trees  with  trees 
built  with  previous  methods.  We  also  provide  empirical  results  to  compare  the  precomputation 
concept  with  traditional  forward  search  methods  with  actual  planning  queries.  We  can  make  this 
comparison  fairly  by  using  the  large  and  diverse  trees  that  we  built.  Such  a  comparison  is  use- 
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ful  for  researchers  as  the  tradeoffs  between  precomputation  and  forward  search  methods  must 
be  understood  before  deciding  to  use  one  of  these  methods.  Complete  System:  Finally,  our 
work  shows  a  complete  system  that  demonstrates  the  concept  of  precomputation  for  planning: 
we  show  how  to  precompute  large  and  diverse  search  trees;  we  describe  an  efficient  runtime 
backward  search  method  for  solving  planning  queries;  we  use  these  methods  in  actual  planning 
scenarios  and  show  runtime  results;  and  we  have  an  interactive  system  with  many  characters 
navigating  in  complex  environments  using  our  approach. 


2.4  Modeling  and  Synthesizing  Variation 

Variability  in  human  motion  has  been  studied  for  generating  diversity  in  motions.  One  major 
previous  approach  for  generating  variation  in  motions  is  to  add  noise.  Perlin  [1751  adds  noise 
functions  to  procedural  motions  to  create  more  realistic  animations  of  running,  standing,  and 
dancing.  Bodenheimer  and  his  colleagues  [[U  adds  noise  to  cyclic  running  motions.  They  con¬ 
structed  a  noise  function  based  on  biomechanical  literature  to  vary  the  joint  angles  in  the  upper 
body.  The  noise  is  added  only  to  the  upper  body,  and  is  synchronized  with  the  arm  swings  in  the 
running  cycle.  Adding  noise  in  such  a  supervised  way  requires  human  knowledge  and  parameter 
tuning.  Our  approach  is  fundamentally  different  because  the  variations  that  we  generate  auto¬ 
matically  come  from  the  data  and  is  not  a  separate  additive  component.  Instead  we  learn  a  joint 
probability  distribution  of  the  motions  from  the  data,  and  then  use  this  distribution  to  generate 
new  motions. 

Pullen  and  Bregler’s  work  [[79l  to  generate  motions  that  are  slightly  different  but  similar  to  the 
original  data  is  most  closely  related  to  our  work.  They  model  the  correlations  between  the  DOFs 
in  the  data  with  a  distribution,  and  synthesize  new  motions  by  sampling  from  this  distribution 
and  smoothing  the  motions.  However,  they  have  to  define  certain  correlations  manually.  For 
example,  they  specify  manually  that  the  hip  angle  affects  the  knee,  and  the  knee  angle  affects 
the  ankle.  The  structure  learning  in  our  DBN  framework  learns  these  relationships  directly  from 
data.  Their  method  only  predicts  the  value  of  each  DOF  given  the  value  of  one  other  user- 
specified  DOF.  In  our  case,  we  can  find  the  probability  of  each  DOF  given  the  values  of  a  subset 
of  DOFs  across  previous  time  steps.  These  subsets  of  DOFs  are  learned  automatically  from  data; 
this  process  is  a  key  component  to  the  idea  behind  DBNs.  Our  method  can  be  thought  of  as 
a  generalization  to  Pullen  and  Bregler’s  approach  [1791.  In  addition,  they  used  their  method  to 
animate  a  2-dimensional  5-DOF  wallaby  figure,  and  a  more  complex  3D  character  in  later  work 
[fMl.  Since  they  have  to  manually  specify  the  probability  relationships  between  the  DOFs  of  the 
character,  their  approach  does  not  work  well  for  human-like  characters  with  many  DOFs.  On  the 
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other  hand,  we  demonstrate  results  of  different  kinds  of  motions  for  a  full  human  figure. 

Our  method  is  similar  to  the  “Texturing”  method  by  Pullen  and  her  eolleagues  f[80l .  They  also 
used  the  idea  that  the  joints  of  a  human  figure  are  eorrelated  to  prediet  the  values  for  some  DOFs 
given  the  values  of  other  DOFs.  Our  DBN  framework  also  depends  on  this  observation  to  prediet 
new  DOF  values.  Their  objeetives  are  different  from  ours:  (i)  their  “synthesis”  proeedure  allows 
them  to  synthesize  the  values  of  some  DOFs  that  did  not  exist  before,  and  (ii)  their  “texturing” 
proeedure  allows  them  to  add  details  to  some  DOFs.  Their  methods  allow  animators  to  more 
easily  eombine  traditional  keyframing  methods  with  motion  eapture  teehnologies.  Our  foeus, 
on  the  other  hand,  is  to  generate  variations  of  motions  from  input  data.  More  speeifieally,  their 
algorithm  has  to  break  the  input  motions  into  segments  and  align  them.  Our  method  does  not 
require  synehronization  or  alignment  of  similar  motion  elips.  In  addition,  their  method  takes 
existing  motion  eurves  from  different  joints,  and  eombine  and  re-order  them  to  generate  new 
sequenees.  They  do  not  explieitly  ehange  the  values  of  the  motion  eurves.  For  our  ease,  it  is 
important  that  we  ean  generate  poses  that  are  eompletely  different  (in  terms  of  values)  than  those 
in  the  inputs.  The  exaet  poses  should  be  different  but  the  overall  motion  should  be  a  variation  of 
the  inputs. 

Our  approaeh  is  similar  to  previous  work  in  texture  synthesis  itTSl  110711 .  An  important  idea 
for  texture  synthesis  is  to  use  a  non-parametrie  approaeh  and  direetly  sample  queries  from  data. 
Our  method  is  similar  beeause  we  also  use  a  non-parametrie  regression  approaeh  to  prediet  val¬ 
ues.  We  first  tried  to  use  a  parametrie  approaeh,  but  that  did  not  work  well.  This  might  be  due 
to  the  small  amount  of  input  data  that  we  have.  We  also  have  temporal  relationships  in  addition 
to  spatial  ones  in  our  DBN  framework.  Other  texture  synthesis  methods  [l5]|  model  the  input 
texture  with  statistieal  methods.  New  and  larger  textures  are  generated  to  be  statistieally  similar 
to  the  inputs.  We  also  learn  a  model  of  the  input  motions  to  synthesize  new  motions.  In  addition, 
Moradoff  and  his  eolleagues  llTOll  re-order  short  segments  of  motion  elips  to  generate  new  se¬ 
quenees  of  motions.  Their  method  is  similar  to  motion  graph  teehniques  for  re-ordering  existing 
motion  sequenees  to  generate  new  sequenees.  However,  the  individual  frames  of  motion  are  not 
new.  In  our  ease,  we  generate  variants  where  individual  frames  are  different  from  those  of  the 
inputs. 

Sidenbladh  and  her  eolleagues  developed  models  of  eyelie  human  motion  in  their  vision 
system  for  traeking  human  figures  [TT^  W2\.  Their  model  ean  potentially  be  used  to  generate 
variations  of  the  input  eyeles,  but  it  was  designed  for  (and  therefore  more  appropriate  for)  a 
traeking  system.  Their  model  allows  them  to  eompute  the  probability  of  a  pose.  This  gives  the 
prior  probability  that  works  well  for  eonstraining  the  seareh  in  a  bayesian  framework  for  traeking. 
Their  model  is  thus  not  designed  to  generate  new  poses.  In  eontrast,  our  generative  model  is  good 
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for  synthesizing  new  motions.  We  learn  a  “distribution”  of  the  motions,  and  sample  from  this 
distribution  to  get  new  motions.  Furthermore,  their  algorithm  needs  to  deteet,  segment,  and  align 
the  eyeles  of  motion.  This  proeess  loses  the  temporal  information  in  the  data.  Our  model  takes 
the  temporal  relationships  into  aeeount,  without  having  to  align  the  input  motion  eyeles. 

Similar  to  the  non-parametrie -based  texture  synthesis  methods  [fT^ll07ll.  Sidenbladh’s  sys¬ 
tem  also  used  a  non-parametrie  sampling-based  approaeh  [|9^.  Our  model  also  uses  this  ap- 
proaeh  sinee  we  use  previous  frames  of  motions  to  prediet  future  ones.  More  speeifieally,  we 
use  the  eorrelations  between  joint  angles  and  eorrelations  in  motions  over  time  to  prediet  future 
DOF  values.  The  important  idea  here  is  that  the  predietion  of  eaeh  future  DOF  value  is  based 
on  some  subset  of  previous  values,  and  this  has  a  eonditional  probability  distribution.  Eaeh  DOF 
value  is  independent  of  other  values  given  the  partieular  subset  used  to  prediet  that  DOF  These 
eonditional  distributions  allow  us  to  build  a  joint  distribution  over  the  motion  trajeetories  for  all 
DOFs,  and  this  joint  distribution  allows  us  to  sample  new  variations  of  motions.  Sidenbladh’s 
framework  is  designed  for  traeking,  and  their  synthesis  teehnique  simply  adds  gaussian  noise 
to  introduee  variety  into  new  motions.  This  proeess  has  the  same  disadvantages  of  methods 
(diseussed  earlier)  that  add  noise  to  existing  motions. 

Maehine  learning  has  been  used  for  learning  models  of  human  motion  data.  Li  and  his 
eolleagues  llMll  generated  new  motions  that  are  statistieally  similar  to  their  input  data.  However, 
they  used  20  minutes  of  daneing  motion  as  training  data.  If  a  large  amount  of  data  is  available, 
it  is  possible  to  just  randomly  replay  or  re-organize  eertain  motion  elips  without  being  able  to 
deteet  repetition  in  the  motion.  One  of  the  strengths  of  our  work  is  that  our  approaeh  ean  handle 
a  small  amount  of  original  data.  Groehow  and  his  eolleagues  ||3T1  developed  a  method  to  learn  a 
probabilistie  model  of  input  poses.  They  use  their  model  to  generate  new  poses  in  an  optimization 
framework  for  doing  inverse  kinematics  (IK):  the  idea  is  to  find  a  pose  that  satisfies  certain  IK 
constraints  and  maximizes  the  probability  of  the  pose  at  the  same  time.  Our  IK  framework  was 
inspired  by  their  work.  Our  idea  is  to  find  solutions  that  satisfy  three  constraints:  the  solution 
should  be  close  to  the  value  predicted  by  the  DBN,  the  smoothness  of  adjacent  poses,  and  the 
IK  constraints  for  the  foot/hand.  Yu  and  Terzopoulos  1111211  use  a  Bayesian  Network  (BN)  to 
model  the  probabilistic  nature  of  motions.  In  their  model,  each  of  the  nodes  or  random  variables 
is  a  high-level  motion  such  as  walking.  The  model  then  specifies  how  likely  a  character  will 
transition  to  other  motions  from  walking.  The  whole  motion  includes  a  small  number  of  different 
motions  and  the  possible  transition  between  them.  However,  these  transition  parameters  were 
not  learned  from  data,  but  were  specified  manually.  Their  contribution  is  to  argue  that  these 
kinds  of  models  can  be  used  to  model  the  probabilistic  and  uncertain  nature  of  motions.  We 
use  a  Dynamic  Bayesian  Network  model  in  our  work.  The  “Dynamic”  refers  to  the  temporal 
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relationships  between  the  random  variables,  whieh  are  not  modeled  explieity  by  BNs.  We  learn 
sueh  a  model  from  data,  and  use  it  to  synthesize  eompletely  new  motions.  It  is  also  interesting  to 
note  that  Yu  and  Terzopoulos’  model  is  similar  to  our  Behavior  Planning  framework.  Their  key 
idea  is  to  use  a  Bayesian  Network  to  model  the  motions,  whereas  our  approach  uses  a  planning 
framework  to  synthesize  long  sequences  of  motions  automatically  for  characters  navigating  in 
a  complex  environment.  Finally,  Hertzmann  [f35ll  mentioned  the  idea  of  using  DBNs  to  model 
motion  data.  Our  work  uses  the  idea  of  DBNs  to  solve  the  problem  of  modeling  and  synthesizing 
variations  in  motion  data. 

We  model  our  motion  data  with  a  Dynamic  Bayesian  Network  (DBN)  [l24l  1271.  A  DBN 
represents  the  state  of  a  set  of  random  variables  and  how  they  change  over  time.  In  our  case,  each 
random  variable  is  a  DOF  of  the  human  skeleton.  The  DBN  represents  a  multivariate  probability 
distribution  of  how  these  DOFs  change  over  time.  The  variations  that  we  generate  are  sampled 
from  this  distribution.  BNs  and  DBNs  have  been  used  for  many  applications,  including  medical 
diagnosis,  automated  help  features  in  software,  and  automated  traffic  prediction.  As  a  specific 
example,  automated  traffic  prediction  [l22ll^  collects  data  on  each  car  and  its  nearby  cars,  and 
makes  predictions  on  their  traffic  patterns.  The  data  that  they  collect  for  each  car  include:  its 
own  velocity,  the  position  and  velocity  of  nearby  cars,  and  whether  or  not  there  is  a  car  to  its  left 
and  right.  They  can  then  learn  a  DBN  with  random  variables  corresponding  to  these  high-level 
descriptions  (such  as  the  position  of  a  nearby  car).  A  potential  prediction  that  can  be  made  from 
the  DBN  is:  if  there  is  a  car  to  my  left  and  its  relative  velocity  is  much  faster  than  me,  then  it 
is  more  likely  that  there  will  not  be  a  car  to  my  left  at  the  next  time  step.  Such  predictions  can 
be  potentially  used  for  automated  driving  and  safety  warning  systems.  These  models  seem  to 
be  ideal  for  modeling  and  synthesizing  variations  in  motion  data,  because  they  model  the  joint 
probability  distribution  of  time-series  data.  The  advantages  of  using  DBNs  for  this  problem 
include:  the  variations  will  not  be  just  adding  noise  as  in  previous  methods;  the  new  output 
motions  will  be  statistically  similar  to  the  inputs;  we  only  need  a  small  amount  of  data;  and  no 
timewarping  is  needed  for  the  input  motion  clips. 

Bayesian  Networks  (BNs)  have  been  used  in  the  animation  community  for  solving  different 
problems,  in  contrast  to  the  variation  problem  that  we  explore.  BNs  have  been  used  to  model 
the  motion  of  virtual  humans  [I112L  where  the  variables  in  their  network  correspond  to  high- 
level  behaviors.  Kwon  and  his  colleagues  [f50l  use  DBNs  for  the  problem  of  animating  the 
interactions  between  two  human-like  characters.  Ikemoto  and  her  colleagues  Il3^  use  similar 
types  of  probabilistic  methods  for  the  problem  of  motion  editing. 

There  is  much  interest  in  the  problem  of  adding  variety  to  virtual  crowds.  Maim  and  his 
colleagues  [|64l  take  a  fixed  number  of  template  character  meshes,  and  vary  them  by  changing 
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their  color  and  adding  different  accessories  to  them.  On  the  other  hand,  our  work  takes  a  fixed 
number  of  template  motions  and  synthesize  new  variant  motions  from  them.  McDonnell  and  her 
colleagues  WTl  perform  user  experiments  to  study  the  perception  of  clones  in  virtual  crowds. 
They  assume  that  the  motion  of  crowds  of  characters  must  be  cloned.  In  contrast,  our  approach 
creates  motion  with  no  exact  clones,  even  though  the  new  variants  are  visually  similar.  The  focus 
of  their  paper  is  to  study  the  perception  of  appearance  and  motion  clones,  whereas  our  approach 
tries  to  model  input  data  and  synthesize  new  motion  variants. 

There  has  been  work  on  learning  the  style  of  motions  from  training  data  @  and  transferring 
the  style  between  motions  [[36l.  Style  and  variation  differ  in  the  following  way:  a  happy  walk 
and  a  sad  walk  are  different  styles  of  walking,  while  two  happy  walks  are  different  variations  of 
a  motion.  Interpolation  methods  [[Ml  I108II  have  been  developed  to  generate  a  spectrum  of  new 
motions  that  are  interpolated  from  the  original  data.  Interpolation  and  variation  are  also  different 
approaches:  we  interpolate  a  five-foot  jump  and  a  ten-foot  jump  to  get  an  eight-foot  jump,  while 
we  take  two  five-foot  jumps  to  generate  variations  of  that  jump. 

There  is  work  in  the  motor  control  community  that  tries  to  explain  variability  that  has  been 
observed  in  many  tasks.  For  example,  Todorov  and  Jordan  [jlOOlilOll  explains  that  one  optimal 
strategy  of  motor  control  is  to  allow  for  variability  in  redundant  dimensions.  They  describe  a 
“minimal  intervention”  principle  where  deviations  are  only  corrected  if  it  interferes  with  task 
goals.  From  their  perspective,  variability  is  also  not  viewed  as  noise  or  error,  but  is  a  useful 
component  of  achieving  certain  tasks.  This  supports  our  approach  of  assuming  that  variation  is 
not  just  a  separate  additive  noise  component  (as  previous  work  do).  We  do  not  provide  a  new 
perspective  to  explain  variability,  but  instead  we  use  a  data-driven  approach  to  actually  model 
variation  from  example  data  and  synthesize  new  variants  from  input  data. 

Our  Variation  Approach:  Additional  value  over  previous  work.  We  feel  that  the  area  of 
modeling  and  synthesizing  motion  variation  is  largely  unexplored.  The  major  previous  method 
for  generating  variations  in  human  motion  is  to  add  noise  to  existing  motions.  However,  biome¬ 
chanical  research  [IM11901  has  shown  that  variation  should  not  be  viewed  as  just  “error”  or  noise. 
Variation  is  rather  a  functional  component  of  motion.  We  therefore  provide  a  data-driven  ap¬ 
proach  to  take  a  few  examples  of  a  particular  type  of  motion,  model  these  inputs  with  a  DBN, 
and  generate  spatial  and  temporal  variants  of  the  original  inputs.  The  reason  for  using  a  DBN  is 
that,  in  the  machine  learning  literature,  DBNs  have  been  developed  to  model  the  joint  probabil¬ 
ity  distributions  of  time-series  data.  Hence  this  model  works  well  for  building  a  distribution  of 
existing  motion  data,  in  order  to  model  and  synthesize  variants  of  the  original  data. 
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Chapter  3 

Behavior  Planning 


While  it  is  relatively  easy  to  ereate  virtual  models  of  elements  in  a  statie  seene,  it  is  ehallenging 
to  populate  these  seenes  with  autonomous  eharaeters.  Creating  3D  models  of  objeets  or  even  a 
3D  model  of  an  entire  city  has  become  increasingly  easy.  One  important  challenge  is  to  generate 
realistic  virtual  humans:  we  want  these  virtual  cities  to  be  inhabited  by  lifelike  and  purposeful 
characters.  Generating  these  autonomous  characters  are  important  for  applications  such  as  games 
and  crowds.  Currently,  these  systems  often  contain  many  simple  scripted  behaviors  or  rules  to 
govern  how  the  characters  behave.  Our  Behavior  Planning  approach  applies  motion  planning 
techniques  to  automatically  generate  the  motions  for  a  large  number  of  characters. 

We  developed  our  method  with  these  properties  in  mind: 

•  We  provide  a  simple  goal-driven  interface  to  direct  where  the  characters  should  go.  This 
interface  allows  the  user  to  have  a  high-level  control  of  the  autonomous  characters. 

•  We  develop  a  single  planning  scheme  that  uses  a  combination  of  existing  motion  planning 
techniques.  Our  scheme  can  handle  multiple  characters  and  dynamic  obstacle  avoidance. 

•  Our  behavior  graph  data  structure  leads  to  our  Precomputed  Search  Trees  method  (next 
chapter),  which  is  much  more  efficient  than  our  basic  planning  approach. 

The  key  difference  of  our  planning  approach  compared  to  previous  planning  methods  is  the 
space  in  which  we  perform  the  planning  in.  Figure  |3.1|  shows  the  possible  range  of  planning 
configuration  spaces  for  generating  the  motions  of  a  human-like  character.  On  the  left  of  the 
figure,  we  have  the  simplest  type  of  planning:  we  consider  only  a  2D  gridmap  of  the  environment. 
In  this  case,  we  have  a  bird’s  eye  view  representation  of  the  environment.  The  character  is 
bounded  by  a  cylinder  for  collision  detection  purposes,  and  a  2D  solution  path  is  generated.  The 
character’s  motion  can  then  be  added  on  with,  for  example,  a  proportional  derivative  controller 
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Figure  3.1:  The  range  of  planning  configuration  spaces  for  generating  the  motions  of  a  human¬ 
like  character.  Our  Behavior  Planning  approach  differs  from  previous  methods  in  that  it  lies  in 
between  the  two  extremes  in  this  range. 

[|49]|.  This  2D  gridmap  planning  is  the  most  efficient:  we  cannot  further  reduce  the  dimensions 
of  the  search  space.  However,  the  solution  only  gives  us  a  simple  2D  path  and  no  human  motion 
is  actually  generated  by  the  planning  algorithm.  On  the  right  of  the  figure,  we  can  plan  for  all  the 
degrees  of  freedom  (DOFs)  of  the  human-like  character  at  the  same  time.  The  solution  would 
give  us  the  motions  for  all  the  DOFs.  However,  this  is  inefficient  and  intractable  due  to  the  large 
number  of  dimensions  (typically  50  to  60)  in  human  motion.  It  is  also  very  difficult  to  guarantee 
that  the  motions  be  humanlike.  Hence  no  previous  work  that  we  know  of  plans  directly  in  this 
space.  This  is  an  extreme  possibility  that  we  describe  here  for  the  sake  of  thinking  about  the 
possible  approaches  that  we  can  take. 

In  this  chapter,  we  present  our  Behavior  Planning  approach  [|5^.  This  approach  lies  in 
between  the  two  extremes  in  the  range  of  planning  spaces.  We  segment  motion  data  into  short 
motion  clips  of  high-level  behaviors  such  as  jogging  forward  or  jumping.  Our  method  then 
applies  planning  techniques  at  the  level  of  motion  clips;  the  actions  of  the  planning  method  are 
these  whole  clips.  Thus,  we  abstract  the  motion  clips  in  a  way  that  the  motion  synthesis  problem 
becomes  a  planning  problem.  We  generate  sequences  of  these  motion  clips  as  output. 

Planning  at  the  level  of  these  motion  clips  is  important  to  the  success  of  the  algorithm.  The 
advantages  include: 

•  Ideal  for  Navigation  Motion.  Our  approach  works  well  for  generating  motion  for  many 
characters  navigating  in  complex  and  dynamic  environments.  Because  of  the  motion  clip 
abstraction,  a  surprisingly  small  data  set  is  enough  for  generating  a  variety  of  output  mo¬ 
tion.  We  can  start  with  a  basic  setup  for  motion  generation  with  one  character  in  static 
environments.  We  can  then  apply  a  collection  of  planning  techniques  to  extend  this  basic 
setup.  For  example,  we  add  time  as  another  dimension  in  the  planning  space  to  handle 
dynamic  obstacles,  and  we  use  prioritized  planning  to  handle  multiple  characters. 

•  Efficiency.  Our  data  structure  is  structured  and  compact.  This  allows  for  a  small  branching 
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factor  in  our  search,  and  hence  a  fast  algorithm.  Sinee  we  have  already  segemented  our 
motion  data  to  the  level  of  meaningful  motion  elips,  the  number  of  nodes  in  our  graphs  (25 
in  our  largest  example)  is  relatively  small  eompared  to  motion  graph  approaehes  [l4ll^[59ll. 
Eaeh  node  of  our  behavior  graphs  eontain  entire  pieees  of  motion  elips.  In  eontrast,  motion 
graphs  typieally  have  on  the  order  of  thousands  of  nodes  eorresponding  to  individual  poses 
of  motion  (and  potentially  tens  of  thousands  of  edges).  Gleieher  and  his  eolleagues  [l28l 
deseribed  how  unstruetured  motion  graphs  ean  be  inappropriate  for  interaetive  systems  that 
require  fast  response  times. 

•  Memory  Usage.  Our  method  requires  a  small  amount  of  data  in  order  to  generate  inter¬ 
esting  motions,  making  it  partieularly  appealing  to  resouree-limited  game  systems.  As  an 
example,  our  synthesized  horse  motions  are  generated  from  only  194  frames  of  data. 

•  Intuitive  Structure.  Because  of  the  structured  form  of  our  behavior  graph,  the  solutions 
that  the  planner  return  can  be  understood  intuitively.  The  high-level  structure  of  behaviors 
makes  it  easier  for  a  non-programmer  or  artist  to  understand  and  work  with  our  system. 
For  example,  a  virtual  character  that  wants  to  retrieve  a  book  from  inside  a  desk  in  another 
room  needs  to  do  the  following:  exit  the  room  it  is  in,  get  to  the  other  room,  enter  it,  walk 
over  to  the  desk,  open  the  drawer,  and  pick  up  the  book.  It  is  relatively  difficult  for  previous 
techniques  to  generate  motion  for  such  a  long  and  complex  sequence  of  behaviors.  Because 
we  have  already  partition  the  motions  into  distinct  high-level  behaviors,  our  planner  can 
generate  a  sequence  of  behaviors  corresponding  to  the  high-level  descriptions  above. 

•  Generality.  We  can  apply  our  algorithm  to  different  characters  and  environments  with¬ 
out  having  to  design  new  behavior  graphs.  For  example,  we  generated  animations  for  a 
skateboarder  and  a  horse  using  essentially  the  same  graph. 

•  Optimality.  Again  because  of  the  structured  nature  of  our  graph,  our  method  can  compute 
optimal  sequences  of  behaviors.  The  optimality  is  respect  to  the  cost  of  the  behaviors.  The 
cost  of  a  particular  behavior  is  the  distance  that  the  character  travels  multiplied  by  a  user 
weight,  which  by  default  is  one. 


3.1  Problem  Statement  and  Overview 

The  algorithm  takes  as  input  a  graph  of  behaviors,  information  about  the  environment,  and  start¬ 
ing  and  goal  locations  for  each  character.  Figure [T2] shows  an  example.  The  problem  is  to  find  a 
sequence  of  behaviors  or  motions  that  allows  each  character  to  move  from  the  start  to  the  goal. 
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Figure  3.2:  Left:  The  problem  inputs  include  a  description  of  the  environment,  a  starting  position 
(larger  green  dot)  and  orientation  (smaller  green  dot  points  toward  the  direction),  and  a  goal 
position  (red  dot).  Right:  The  output  is  a  motion  sequence. 

The  sections  in  this  chapter  discuss  each  of  these  parts  in  more  detail: 

Behavior  Graph.  Motion  clips  are  abstracted  as  high-level  behaviors  and  associated  with 
a  behavior  graph  that  defines  the  movement  capabilities  of  the  character.  We  describe  how  to 
construct  this  graph,  explain  the  costs  associated  with  the  motion  clips,  and  provide  details  about 
the  data  that  we  used. 

Environment  Abstraction.  We  describe  the  environment  representation,  which  is  used  mostly 
for  collision  avoidance.  Some  interesting  cases  include:  obstacles  that  the  character  must  jump 
over  or  duck  under,  and  dynamic  obstacles. 

Behavior  Planner.  We  describe  an  A* -search  planner  [15^  that  performs  a  global  search  of 
the  nodes  in  the  graph  and  computes  a  sequence  of  behaviors  for  each  character  to  reach  their 
corresponding  goal  position. 

Motion  Generation  and  Blending.  We  convert  the  sequence  of  behaviors  into  motions  by 
concatenating  and  blending  the  motion  clips. 

Results.  We  show  results  of  synthesized  animations  with  multiple  characters  planning  simul¬ 
taneously  in  both  static  and  dynamic  environments. 

3.2  Behavior  Graph 

The  behavior  graph  defines  the  movement  capabilities  of  the  character.  Each  node  or  behavior  of 
the  graph  consists  of  a  collection  of  motion  clips  that  represent  a  high-level  behavior,  and  each 
directed  edge  represents  a  possible  transition  between  two  behaviors.  Figure  [33]  left  shows  a 
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Figure  3.3:  Left:  A  simple  graph  of  behaviors.  Eaeh  node  or  behavior  eontains  a  set  of  example 
motion  elips  for  that  behavior.  Eaeh  edge  indieates  allowable  transitions  between  behaviors. 
Right:  An  example  graph  used  for  a  human  eharaeter  that  ineludes  speeial  jumping  and  erawling 
behaviors. 

simple  example.  The  start  node  allows  the  eharaeter  to  transition  from  standing  still  to  jogging, 
and  the  end  node  allows  the  eharaeter  to  transition  from  jogging  to  standing  still. 

Most  of  the  motion  elips  are  segmented  so  that  the  beginning  and  end  poses  are  similar. 
Transitions  are  possible  if  the  end  of  one  elip  is  similar  to  the  beginning  of  another.  Henee  most 
elips  ean  transition  to  eaeh  other,  allowing  for  a  larger  variety  in  the  number  of  possible  output 
motions.  The  edges  of  the  graph  eneode  these  transition  possibilities. 

In  praetiee,  we  have  found  that  it  is  a  good  idea  to  inelude  some  motion  elips  that  are  relatively 
short  eompared  to  the  length  of  the  expeeted  solutions.  This  makes  it  easier  for  the  planner  to 
globally  arrange  the  elips  in  a  way  that  avoids  the  obstaeles  even  in  eluttered  environments. 

There  ean  be  multiple  motion  elips  within  a  behavior  in  the  graph.  Having  multiple  elips  that 
differ  slightly  in  the  style  or  details  of  the  motion  adds  to  the  variety  of  the  synthesized  motions, 
espeeially  if  there  are  many  eharaeters  utilizing  the  same  graph.  However,  elips  of  motions  in 
the  same  node  should  be  fairly  similar  at  the  maero  seale,  differing  only  in  the  subtle  details.  For 
example,  if  a  “jog  left”  elip  runs  a  signifieantly  longer  distanee  than  another  “jog  left”  elip,  they 
should  be  plaeed  in  different  nodes  and  assigned  different  eosts. 

The  eost  of  eaeh  elip  is  eomputed  by  the  distanee  that  the  root  position  travels  multiplied  by 
a  user  weight,  whieh  by  default  is  one.  The  distanee  that  the  root  position  travels  is  the  length 
of  the  eorresponding  2D  path  if  we  projeet  the  root  position  of  the  eharaeter  to  the  ground  plane. 
Eaeh  node  in  the  graph  has  only  one  eost.  For  multiple  elips  in  the  same  node,  we  take  the 
average  of  the  eost  of  eaeh  elip.  The  user  weight  allows  an  animator  to  eontrol  preferenees  for 
eertain  types  of  behaviors. 

Figure  |3.3|  right  shows  an  example  of  the  graph  used  for  the  human-like  eharaeter.  The 
most  eomplieated  graph  that  we  used  has  a  similar  strueture,  exeept  for:  (1)  additional  jogging 
and  turning  states;  and  (2)  more  speeialized  behaviors  sueh  as  jumping.  We  used  seven  types  of 
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jogging  behaviors:  one  moving  forward,  three  types  of  turning  left,  and  three  of  turning  right.  We 
also  have  a  few  jogging  behaviors  that  are  relatively  shorter  in  length.  In  addition  to  “jump”  and 
“crawl”,  there  are  also  states  for:  duck  under  an  overhanging  obstacle,  different  types  of  jumps, 
and  stop-and-wait.  The  “stop-and-wait”  behavior  allows  the  character  to  stop  and  stand  still  for 
a  short  while  during  a  jog.  For  our  experiments,  the  raw  motion  capture  data  was  downsampled 
to  30  Hz.  Our  most  complicated  graph  has  1648  frames  of  data,  25  states,  and  1  to  4  motion 
clips  per  state.  Each  of  the  seven  types  of  jogging  mentioned  above  has  about  25  frames  of  data 
per  motion  clip.  Each  of  the  specialized  behaviors  has  about  40  to  130  frames,  and  each  of  the 
shorter  jogging  states  has  about  15  frames.  In  addition,  there  are  8  frames  of  data  before  and 
after  each  motion  clip  that  are  used  for  blending. 

We  also  have  motion  data  for  a  skateboarder  and  a  horse.  Their  graphs  are  similar  to  the  one 
in  Eigure[T3] right.  Eor  the  skateboarder,  there  are  five  gliding  behaviors:  one  moving  forward, 
two  types  of  left  turns,  and  two  types  of  right  turns.  In  addition,  there  are  states  for  jumping, 
ducking,  and  stopping-and- waiting.  The  graph  has  835  frames  of  data,  II  states,  and  1  motion 
clip  per  state.  Each  motion  clip  has  about  60  frames  of  data,  and  an  additional  16  frames  used 
for  blending.  Eor  the  horse,  we  only  had  access  to  one  keyframed  forward  gallop  motion.  We 
defined  a  total  of  five  galloping  behaviors:  one  moving  forward,  two  types  of  turning  left,  and 
two  turning  right.  All  turning  motions  were  keyframed  from  the  forward  motion.  The  graph  has 
194  frames  of  data,  5  states,  and  1  motion  clip  per  state.  Each  of  the  clips  consists  of  20  frames 
of  data,  and  an  additional  12  frames  used  for  blending. 


3.3  Environment  Abstraction 

Eor  static  environments,  we  represent  the  environment  e  as  a  2D  heightfield  gridmap.  This  map 
encodes  the  obstacles  that  the  character  should  avoid,  the  free  space  where  the  character  can 
navigate,  and  information  about  special  obstacles  such  as  an  archway  that  the  character  can  crawl 
under.  This  information  can  be  computed  automatically  given  the  arrangement  of  obstacles  in  a 
scene.  The  height  value  is  used  so  that  we  can  represent  terrains  with  small  slopes  or  hills. 

The  virtual  character  is  bounded  by  a  cylinder  with  radius  r.  The  character’s  root  position  is 
the  center  of  this  cylinder.  The  character  is  not  allowed  to  go  anywhere  within  a  distance  r  of  an 
obstacle.  As  is  standard  in  robot  path  planning,  we  enlarge  the  size  of  the  obstacles  by  r  so  that 
the  character  can  then  be  represented  as  a  point  in  the  gridmap  [l63]|. 

In  order  to  handle  special  obstacles,  we  compute  a  set  of  near  regions  and  within  regions.  A 
near  region  is  where  the  character  is  near  the  obstacle  and  some  special  motions  such  as  jumping 
can  be  performed,  and  a  within  region  is  where  the  character  can  be  in  the  process  of  executing 
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the  special  motions.  We  assume  our  environments  are  bounded  by  obstacles  that  prevent  the 
character  from  navigating  into  infinite  space. 

Each  of  the  special  motions  such  as  crawling  need  to  be  pre-annotated  with  the  type  of  corre¬ 
sponding  special  obstacle.  In  addition,  the  motion  clips  that  are  more  complex  can  be  optionally 
pre-annotated  with  the  time  that  the  special  motion  is  actually  executed.  For  example,  a  long 
jumping  motion  clip  where  the  character  might  take  a  few  steps  before  the  jump  can  be  anno¬ 
tated  with  the  time  where  the  jump  actually  takes  place.  If  there  is  no  such  annotation,  we  simply 
assume  that  the  jump  occurs  in  the  middle  of  the  motion  clip. 

Our  algorithm  handles  dynamic  environments,  given  that  we  know  a  priori  how  each  obstacle 
moves.  Given  the  motion  trajectories  of  all  the  moving  objects,  we  define  a  function  E{t)  that 
given  a  time  t,  returns  the  environment  e  at  that  time.  For  static  environments,  this  function  is 
constant. 


3.4  Behavior  Planner 

Given  the  behavior  graph  and  environment  as  input,  we  search  for  a  sequence  of  behaviors  to 
allow  each  character  to  navigate  towards  its  goal  position.  The  search  algorithm  uses  two  inter¬ 
related  data  structures:  (1)  a  tree  of  nodes  that  is  continually  expanded  during  the  search;  and 
(2)  a  priority  queue  of  nodes  ordered  by  cost,  which  represent  potential  nodes  to  be  expanded 
during  the  next  search  iteration.  Each  node  in  the  tree  stores  the  motion  clip  or  action  a  chosen, 
and  the  position,  orientation,  time,  and  cost.  This  means  that  if  we  choose  the  path  from  the  root 
node  of  the  tree  to  some  node  n,  the  position  stored  in  n  corresponds  to  the  character’s  global 
position  if  it  follows  the  sequence  of  actions  stored  along  the  path.  The  purpose  of  the  queue  is 
to  select  which  nodes  to  expand  next  by  keeping  track  of  the  cost  of  the  path  up  to  that  node  and 
expected  cost  to  reach  the  goal.  The  priority  queue  can  be  implemented  efficiently  using  a  heap 
data  structure. 

In  order  to  get  the  position,  orientation,  time,  and  cost  at  each  node  during  the  search,  we 
first  compute  automatically  the  following  information  for  each  action  a:  (1)  the  relative  change 
in  the  character’s  root  position  and  orientation,  (2)  the  change  in  time  (represented  by  the  change 
in  number  of  frames),  and  (3)  the  change  in  cost. 

The  behavior  planner  uses  an  A*-search  approach;  psuedocode  is  shown  in  Algorithm 
The  search  algorithm  is  optimal  with  respect  to  the  cost  of  the  nodes.  The  total  cost  at  each 
node  is  the  sum  of  the  cost  of  the  path  up  to  that  node  and  the  expected  cost  to  reach  the  goal 
(DistToGoal).  The  planner  initializes  the  root  of  the  tree  with  the  state  Smu,  which  represents 
the  starting  configuration  of  the  character  at  time  f  =  0.  The  planner  then  iteratively  expands  the 
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Algorithm  1:  Behavior  Planner 
Tree.Initialize(Si„ji ) ; 

Queue.  Inscrt(527i2i  ,DistXoGoal(527i2t  goaO^  ? 
while  ! Queue.  EmptyO  do 

■Sbest  Queue.RemoveMinO; 
if  GoalReached(sbest,s goal)  then 
I  return 
end 


e  ^  E{s best -time); 
A  ^  F{Sbesti  s); 

foreach  a  e  Ado 


■^next  T (yShesti 

if  G(^Sjiext:  ^best:  then 
Tree.Expand(s„e:,;t,S6esQ; 


I  Queue.Insert(57je3;i  9l-^ist'TdGoal(57je3;i  ,5^oQr/))? 

end 

end 

end 

return  no  possible  path  found; 


lowest  cost  node  Sbest  in  the  queue  until  either  a  solution  is  found,  or  until  the  queue  is  empty, 
in  which  case  there  is  no  possible  solution.  If  Sbest  reaches  Sgoai  (within  some  small  tolerance  e), 
then  the  solution  path  from  the  root  node  to  Sbest  is  returned.  Otherwise,  the  successor  states  of 
Sbest  are  considered  for  expansion  in  the  tree. 

The  function  F  returns  the  set  of  actions  A  that  the  character  is  allowed  to  take  from  Sbest- 
This  set  is  determined  by  the  transitions  in  the  graph.  Some  transitions  may  only  be  valid  when 
the  character’s  position  is  in  the  near  regions  of  the  special  obstacles.  Moreover,  F  random 
selects  a  motion  clip  within  each  chosen  node,  if  there  are  multiple  clips  in  a  node.  The  function 
T  takes  the  input  state  Sin  and  an  action  a  as  parameters  and  returns  the  output  state  Sout  resulting 
from  the  execution  of  that  action.The  function  /  represents  the  translation  and  rotation  that  may 
take  place  for  each  clip  of  motion. 

The  function  G  determines  if  we  should  expand  Snext  as  a  child  node  of  Sbest  in  the  tree. 
First,  collision  checking  is  performed  on  the  position  of  Snext-  This  also  checks  the  intermediate 
positions  of  the  character  between  Sbest  and  Snext-  The  discretization  of  the  positions  between 
these  two  states  should  be  set  appropriately  according  to  the  speed  and  duration  of  the  action. 
The  amount  of  discretization  is  a  tradeoff  between  the  search  speed  and  the  accuracy  of  the 
collision  checking.  For  the  special  actions  such  as  jumping,  we  also  check  to  see  if  the  character 
is  inside  the  within  regions  of  any  corresponding  obstacles  during  the  execution  of  the  action.  In 
the  case  of  a  jumping  motion,  for  example,  since  we  have  annotated  when  the  jump  occurs,  we 
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can  add  this  time  to  the  aeeumulated  time  at  that  point  and  use  the  total  time  to  index  the  funetion 
E. 

To  handle  dynamie  obstaeles  in  the  environment,  our  planning  algorithm  eonsiders  time  as 
another  DOF  [|5^.  As  we  build  our  seareh  tree  of  nodes,  we  keep  traek  of  the  total  time  it  takes 
to  arrive  at  eaeh  node  and  perform  eollision  deteetion  with  respeet  to  the  obstaeles  at  that  time. 
This  assumes  that  the  trajeetories  of  the  obstaeles  are  known  in  advanee.  In  the  psuedoeode,  we 
see  that  the  time  at  (Sbest-time)  is  used  to  find  the  eonfiguration  of  the  environment  e  at  that 
time. 

As  a  final  step  for  funetion  G,  we  utilize  a  state-indexed  table  to  keep  traek  of  loeations  in  the 
environment  that  have  previously  been  visited.  If  the  global  position  and  orientation  of  a  potential 
node  Snext  has  been  visited  before,  the  funetion  G  will  return  false,  thereby  keeping  it  from  being 
expanded.  This  prevents  the  seareh  from  infinitely  eyeling  between  different  previously  explored 
positions.  Henee  the  algorithm  will  terminate  in  finite  time  if  no  solution  exists. 

3.5  Motion  Generation  and  Blending 

After  the  seareh  algorithm  returns  a  sequenee  of  behaviors,  we  eonvert  that  sequenee  into  aetual 
motion  for  the  eharaeter.  To  smooth  out  the  diseontinuities  where  the  transitions  between  be¬ 
haviors  oeeur,  we  linearly  interpolate  the  root  positions  and  use  a  smooth-in,  smooth-out  slerp 
interpolation  funetion  for  the  joint  rotations. 

3.6  Results 

We  synthesized  motions  for  multiple  eharaeters  by  doing  prioritized  planning.  We  plan  the  mo¬ 
tion  for  the  first  eharaeter  as  usual;  eaeh  additional  eharaeter’s  motion  is  then  synthesized  by  as¬ 
suming  that  all  previous  eharaeters  are  moving  obstaeles.  Prioritized  planning  does  not  guarantee 
a  globally  optimal  solution  for  a  given  group  of  eharaeters,  as  solving  this  multi-agent  planning 
problem  is  known  to  be  PSPACE-hard  [|5^.  Although  it  is  neither  fully  general  nor  optimal,  we 
have  found  that  prioritized  planning  is  effieient  and  performed  very  well  in  our  examples. 

We  present  some  experimental  results  that  demonstrate  the  effeetiveness  of  our  approaeh. 
Figure  |3.4|  shows  three  human  eharaeters  navigating  in  an  environment  with  a  eylinder-shaped 
tree  obstaele  that  gradually  falls  down.  The  first  eharaeter  jogs  past  this  obstaele  before  it  falls, 
while  the  two  that  follow  jump  over  it  after  it  has  fallen.  Our  planner  takes  less  than  one  seeond 
to  synthesize  about  10  seeonds  of  animation  for  eaeh  eharaeter.  In  general,  the  amount  of  seareh 
time  is  signifieantly  less  than  the  amount  of  motion  that  the  planner  generates. 
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Figure  3.4:  A  dynamic  environment  with  a  falling  tree.  Left:  Before  it  falls,  the  characters  are 
free  to  jog  normally  in  the  open  space.  Center:  As  it  is  falling,  the  characters  can  neither  jog  past 
nor  jump  over  it.  Right:  After  it  has  fallen,  the  characters  can  jump  over  it. 


Figure  3.5:  The  characters  avoid  each  other  and  the  dynamic  obstacles  (spheres). 


Figure  |3.5|  shows  an  example  of  twenty  characters  simultaneously  avoiding  each  other  and 
a  number  of  moving  obstacles.  The  motions  of  the  moving  obstacles  are  pre-generated  from  a 
rigid-body  dynamics  solver.  Their  positions  at  discrete  time  steps  are  then  automatically  stored 
into  the  time-indexed  gridmaps  representing  the  dynamic  environment.  In  the  animation,  the 
characters  appear  to  move  intelligently,  planning  ahead  and  steering  away  from  the  moving  ob¬ 
stacles  in  advance. 


3.7  Discussion 

We  have  presented  a  Behavior  Planning  approach  to  automatically  generate  realistic  motions  for 
animated  characters.  We  model  the  motion  data  as  abstract  high-level  behaviors.  Our  behavior 
planner  then  performs  a  global  search  of  a  data  structure  of  these  behaviors  to  synthesize  mo- 
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tion.  The  main  contribution  of  our  approach  is  in  the  abstraction  of  motion  data  such  that  we 
can  apply  planning  techniques  to  efficiently  generate  goal-driven  navigation  motion  for  virtual 
human-like  characters.  We  have  shown  that  our  approach  works  well  for  generating  navigation 
motions.  It  is  able  to  generate  motions  for  many  characters  navigating  simultaneously  in  com¬ 
plex  and  dynamic  environments.  In  addition,  because  of  our  small  graph  size,  the  search  time  is 
fast. 

As  there  is  a  lot  of  previous  work  in  the  areas  of  planning  and  animation,  we  view  our  ap¬ 
proach  as  one  motion  planning  method  among  a  spectrum  of  methods  (Figure [TT|).  Our  approach 
differs  from  previous  methods  in  that  it  has  a  carefully  chosen  planning  space.  This  spectrum 
of  methods  should  be  well  understood  before  one  method  is  chosen  among  them. 

The  main  limitation  of  our  approach  is  that  we  need  to  manually  build  a  behavior  graph, 
and  have  motion  data  that  is  appropriately  segmented.  We  have  found  that  it  is  not  difficult  to 
construct  and  re-use  our  graphs  because  of  their  small  size.  However,  if  we  have  a  large  data  set 
that  we  wish  to  use,  then  it  might  be  more  appropriate  to  consider  motion  graph  techniques.  In 
addition,  we  assume  that  we  are  given  a  set  of  segmented  and  blendable  motion  clips  as  input. 
The  motion  clips  have  to  be  easily  blendable  into  the  other  ones,  in  order  for  them  to  be  blended 
smoothly  into  longer  sequences. 

Another  limitation  is  that  the  output  sequence  must  be  a  concatenation  of  the  input  clips, 
since  no  changes  to  the  original  input  motion  clips  are  made  in  our  current  system.  Hence 
our  planner  does  not  allow  the  virtual  character  to  exactly  match  precise  goal  postures.  Our 
focus  in  on  efficiently  generating  complex  sequences  of  large-scale  motions  across  large  and 
complex  terrain  involving  different  behaviors.  Given  a  small  number  of  appropriately  designed 
“go  straight”,  “turn  left”,  and  “turn  right”  actions,  our  planner  can  generate  motions  that  cover 
all  reachable  space  at  the  macro-scale.  No  motion  editing  is  required  to  turn  fractional  amounts 
or  traverse  a  fractional  distance  because  we  are  computing  motion  for  each  character  to  travel 
over  relatively  long  distances  (compared  to  each  motion  clip).  The  algorithm  globally  arranges 
the  motion  clips  in  a  way  that  avoids  obstacles  in  cluttered  environments  while  reaching  distant 
goals.  The  character  stops  when  it  is  within  a  small  distance  e  from  the  goal  location.  If  matching 
a  precise  goal  posture  is  required,  motion  editing  techniques  [fT3l  I109H  may  be  used  after  the 
blending  stage. 

One  important  insight  of  our  work  is  that  the  abstraction  of  motions  as  high-level  behaviors 
and  the  carefully  chosen  planning  space  lead  to  both  the  strengths  and  weaknesses  of  the  ap¬ 
proach.  The  weakness  is  that  we  require  segmented  and  blendable  motion  clips  corresponding  to 
the  high-level  behaviors.  The  strength  is  that  since  the  number  of  these  behaviors  and  input  clips 
are  small,  the  search  algorithm  is  fast  and  works  well  for  generating  navigation  motions.  In  addi- 
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tion,  we  empirically  found  that  using  an  extremely  small  data  set  is  enough  for  generating  many 
types  of  navigation  motions.  For  example,  having  five  to  seven  jogging  behaviors  is  enough  to 
generate  navigation  motions  for  many  characters  in  large  environments. 

A  possible  direction  for  future  work  is  to  parametrize  the  nodes  in  the  graph.  Instead  of 
having  a  “jog  left”  behavior  state,  we  can  have  a  “jog  left  by  x  degrees”  state.  Such  a  state  might 
use  interpolation  methods  [[Ml  I108B  to  generate  an  arbitrary  turn  left  motion  given  a  few  input 
clips.  This  can  decrease  the  amount  of  input  data  needed,  while  increasing  the  variety  of  motion 
the  planner  can  generate.  We  can  also  have  behaviors  such  as  “jump  forward  x  meters  over  an 
object  of  height  h’\  This  would  allow  our  system  to  work  in  a  larger  variety  of  environments. 

Our  behavior  graph  has  a  compact  and  structured  form.  This  is  important  to  the  approach 
presented  in  the  next  chapter,  which  vastly  improves  the  efficiency  for  finding  a  sequence  of 
motion  clips  that  allows  a  character  to  reach  a  goal. 
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Chapter  4 

Precomputed  Search  Trees 


In  the  previous  ehapter,  we  deseribed  a  method  for  autonomously  generating  the  motions  for  mul¬ 
tiple  eharaeters  navigating  in  a  eomplex  environment,  given  that  we  have  a  set  of  pre-segmented 
motion  elips.  This  previous  teehnique  applies  traditional  A*-searoh  methods.  It  builds  a  seareh 
tree  during  runtime  and  performs  a  forward  seareh  to  find  a  solution  path.  We  refer  to  this  as  a 
forward  search  as  it  ean  solve  problems  with  one  speeifie  start  loeation  and  one  goal  loeation, 
and  a  tree  is  built  in  the  forward  direetion  from  the  start  towards  the  goal.  However,  the  seareh 
ean  be  very  slow  if  there  are  many  eharaeters  and  we  have  to  build  a  tree  to  perform  a  seareh  for 
eaeh  eharaeter.  In  our  examples  in  the  previous  ehapter,  this  seareh  eannot  be  done  in  real-time 
if  we  have  about  20  or  more  eharaeters.  The  overall  approaeh  of  taking  one  starting  loeation  and 
one  goal  loeation  for  eaeh  eharaeter,  and  performing  some  kind  of  seareh  to  find  a  solution  is 
typieal  to  traditional  forward  seareh  methods. 

The  goal  of  the  work  in  this  ehapter  is  to  develop  a  method  to  generate  these  motions  mueh 
faster  than  before,  so  that  we  ean  have  a  real-time  framework  for  multiple  eharaeters  navigating 
in  large  and  dynamie  environments.  This  is  an  important  problem  for  real-time  graphies  appliea- 
tions  beeause  existing  methods  often  do  not  seale  well  to  a  large  number  of  eharaeters,  espeeially 
if  we  want  these  eharaeters  to  exhibit  human-like  motions. 

This  problem  of  how  to  develop  a  real-time  framework  motivated  us  to  try  to  speed  up  our 
Behavior  Planning  approaeh  by  exploring  what  we  ean  eompute  in  advanee  and  what  we  have  to 
do  during  runtime.  Instead  of  performing  a  forward  seareh  during  runtime  to  solve  the  planning 
problem  with  one  starting  loeation  and  one  goal  loeation,  the  key  idea  of  our  new  approaeh  [I53l 
is  to  first  eompute  this  seareh  tree  beforehand  without  eonsidering  the  obstaeles  and  start/goal 
loeations,  given  that  we  have  some  pre-segmented  motion  elips.  We  then  use  the  preeomputed 
tree  to  effieiently  find  a  solution  for  any  eonfiguration  of  obstaeles  and  any  start/goal  queries. 
The  runtime  proeess  performs  a  backward  search,  as  it  begins  from  the  goal  and  attempts  to 
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backtrace  a  valid  path  towards  the  start.  Hence  we  make  a  tradeoff  of  memory  for  speed:  we 
have  to  use  extra  memory  to  store  this  tree,  but  we  get  a  mueh  faster  runtime  for  our  seareh.  We 
found  that,  by  doing  this  precomputation,  we  get  a  runtime  that  is  two  orders  of  magnitude  faster 
than  traditional  forward  search  methods. 

There  has  been  mueh  work  [|3[TS|2Tl|^|78l|89l  that  first  eomputes  a  map  of  the  environment 
before  motions  are  aetually  generated.  This  can  be  considered  as  a  type  of  precomputation,  but 
these  maps  are  usually  built  for  a  given  environment  and  the  maps  have  to  be  rebuilt  for  different 
environments.  More  “dynamie”  versions  of  these  maps  have  been  developed  [l26ll9^[TTTl.  The 
key  differenee  here  is  that  they  build  a  map  given  the  environment.  In  our  Precomputed  Search 
Trees  approaeh,  instead  of  starting  with  the  environment,  we  start  from  the  motions  that  are  avail¬ 
able  to  the  eharaeter  and  build  a  tree  of  the  motions.  We  then  deal  with  the  obstaeles  only  when 
neeessary  as  the  eharaeter  moves  about  in  the  environment.  This  has  a  number  of  advantages: 
(i)  we  ean  re-use  the  same  tree  for  arbitrary  environments,  for  any  obstaele  eonfiguration  and 
goal  position  in  the  same  environment,  and  for  different  parts  of  a  large  environment;  (ii)  we  ean 
use  the  same  tree  for  all  the  eharaeters  in  the  scene;  and  (iii)  we  can  handle  dynamie  obstaeles 
on-the-fly  instead  of  taking  them  into  aeeount  beforehand. 

The  eontributions  of  our  work  in  this  ehapter  is  twofold.  First,  we  present  a  novel  planning 
approach  based  on  preeomputation:  we  preeompute  a  search  tree  of  possible  motion  paths  and 
then  use  a  baekward  seareh  method  during  runtime  to  solve  planning  queries.  We  experimentally 
show  that  this  approach  is  more  than  two  orders  of  magnitude  faster  than  traditional  forward 
search  methods.  Using  our  approaeh,  we  built  an  interaetive  system  where  multiple  eharaeters 
eontinuously  navigate  and  respond  to  user  ehanges  to  obstaeles  and  goal  positions.  Seeond,  we 
present  a  simple  but  effeetive  randomized-based  teehnique  for  precomputing  large  and  diverse 
trees.  As  there  have  been  a  lot  of  reeent  work  on  the  topie  of  path  diversity,  we  experimentally 
eompare  the  advantages  and  disadvantages  of  our  teehnique  with  other  methods  for  building 
diverse  trees. 


4.1  Problem  Statement  and  Overview 

The  problem  setup  is  the  same  as  in  the  previous  chapter.  The  inputs  to  the  system  inelude:  a 
starting  position  and  orientation,  a  goal  position,  a  description  of  the  environment  geometry,  and 
a  behavior  graph  of  motion  elips.  In  this  ehapter,  we  foeus  on  synthesizing  sequenees  of  motions 
effieiently.  Figure  |4.I|  shows  an  overview  of  our  system.  The  seetions  in  this  ehapter  diseuss 
eaeh  of  these  parts  in  detail: 
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Figure  4. 1 :  Overview  of  the  system. 


Precomputation  of  Tree.  In  the  precomputation  phase,  we  build  a  search  tree  of  the  nodes  in 
the  behavior  graph.  At  this  stage,  we  do  not  take  into  account  the  obstacles  and  goal  positions. 
We  first  describe  the  exhaustive  tree  and  the  original  “pruned”  version  [|5^  of  the  tree.  We  then 
describe  a  simple  but  effective  randomized-based  technique  to  build  scalable  and  diverse  trees. 

Precomputation  of  Environment  and  Goal  Gridmaps.  This  is  also  a  precompututation  step. 
We  compute  gridmaps  over  the  tree  so  that  we  can  access  the  nodes  and  paths  in  the  tree  more 
efficiently  later. 

Mapping  Obstacles  to  Environment  Gridmap.  This  is  the  first  main  runtime  step.  The  lo¬ 
cations  of  the  obstacles  are  mapped  to  the  environment  gridmap,  which  allows  for  more  efficient 
collision  checks  during  the  runtime  backward  search. 

Runtime  Backward  Search.  This  is  the  second  main  runtime  step.  This  phase  first  selects  a 
sub-goal  (which  can  just  be  the  final  goal)  that  we  want  to  reach.  We  perform  a  backward  (from 
goal  to  start)  search  in  the  precomputed  tree  to  find  a  sub-path  that  leads  to  the  sub-goal.  This 
sub-path  is  added  as  part  of  the  final  solution. 

Coarse-Level  Planner.  This  is  an  optional  step  needed  for  distant  goals.  This  step  is  performed 
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only  once  for  each  goal  query,  and  precedes  the  two  main  runtime  steps.  If  the  final  goal  is  far 
away  from  the  starting  location,  we  first  use  a  bitmap  planner  to  search  for  a  coarse  path.  This 
coarse  path  is  then  repeatedly  used  to  select  intermediate  sub-goals  for  use  in  repeated  iterations 
of  the  runtime  phase.  For  each  sub-goal,  the  two  main  runtime  steps  are  performed. 

Evaluation.  We  present  an  application  where  the  user  can  move  the  obstacles  and  the  goal 
locations,  and  the  virtual  characters  will  respond  to  these  changes  interactively.  We  compare  our 
tree  precomputation  methods  with  previous  methods  for  building  similar  types  of  diverse  trees. 
We  compare  our  overall  precomputation  approach  with  traditional  forward  search  methods. 

4.2  Precomputation  of  Tree 

This  is  the  first  precomputation  step.  We  do  not  take  into  account  the  obstacles  and  goal  positions 
yet.  We  take  the  behavior  graph  of  motions,  and  compute  a  tree  of  nodes  of  these  motions.  Each 
path  (of  nodes)  in  the  tree  corresponds  to  a  sequence  of  motions  (if  we  take  the  corresponding 
motion  clips  and  concatenate  them).  This  tree  represents  all  the  possible  paths  and  motions  that 
the  character  can  take  starting  from  any  position  and  orientation. 

In  all  three  cases  described  below,  each  node  of  the  tree  has  the  position,  orientation,  cost,  and 
time  of  the  path  up  to  that  node  starting  from  the  root  node.  Since  each  node  has  only  one  parent, 
we  can  trace  back  the  path  to  the  root  to  find  the  sequence  of  behavior  nodes  that  can  reach  that 
point.  Hence  each  node  also  represents  the  path  up  to  and  including  that  point.  Each  node  also 
has  a  blocked  variable,  initialized  here  with  UNBLOCKED.  If  this  variable  is  set  to  BLOCKED, 
this  means  we  know  for  sure  that  we  can  neither  reach  that  point  nor  any  of  the  corresponding 
descendant  nodes.  However,  the  path  from  the  root  to  the  parent  node  of  that  point  may  still  be 
reachable.  This  blocked  variable  is  used  in  the  runtime  steps. 

We  describe  three  cases  of  tree  precomputation:  (i)  the  exhaustive  tree  includes  all  nodes  up 
to  a  certain  depth  level;  (ii)  the  “pruned”  tree  includes  a  subset  of  the  exhaustive  nodes;  and  (iii) 
the  scalable  and  diverse  randomized-based  tree  has  diverse  paths  that  scale  to  large  environments. 

4.2.1  Exhaustive  Tree 

We  build  a  tree  of  all  nodes  up  to  a  certain  depth  level.  In  general,  the  tree  size  is  0{{average  hY), 
where  b  is  the  branching  factor  and  d  is  the  depth  level.  If  we  build  the  exhaustive  tree  (Eigure 
|4.2|left),  its  size  grows  exponentially  with  respect  to  depth  level. 

The  exhaustive  tree  finds  optimal  solutions,  since  it  includes  all  paths  up  to  a  depth  level. 
However,  this  also  means  that  the  solutions  are  limited  by  that  depth  level.  In  practice,  the  path 
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Figure  4.2:  Frequency  plot  of  the  precomputed  tree.  Each  point  represents  the  number  of  paths 
that  can  reach  that  point  from  the  root  of  the  tree.  The  root  is  near  the  middle  of  each  figure  and 
the  tree  progresses  in  a  forward  direction  (or  up  in  the  figure).  The  tree  covers  an  area  that  is 
approximately  a  half  circle  of  radius  16  meters,  with  the  character  starting  at  the  center  of  the 
half  circle.  The  majority  of  paths  end  up  in  an  area  between  8  and  14  meters  away  from  the  start. 
We  used  about  1,500  frames  of  motion  at  30  Hz.  Left:  Exhaustive  tree  of  6  depth  levels  built 
from  graph  with  21  behavior  nodes.  This  tree  has  over  6  million  nodes  (over  300  MB).  Right: 
The  pruned  tree  has  220,000  nodes  (about  10  MB). 


sizes  are  small  (ie.  up  to  5  or  6  depth  levels).  The  memory  used  to  store  a  tree  is  large,  up  to 
about  1  GB  in  our  examples.  Therefore,  we  do  not  recommend  using  the  exhaustive  tree.  We 
only  discuss  this  case  here  for  completeness. 


4.2.2  Pruned  Tree 

The  motivation  for  having  a  pruned  tree  is  that  the  exhaustive  case  requires  a  very  large  memory 
and  is  not  practical  for  actual  use.  The  “pruned”  version  is  the  tree  we  developed  in  Il53l.  and 
used  in  our  interactive  character  system.  We  use  the  term  “pruned”  tree  because  the  nodes  are  a 
subset  of  the  ones  in  the  exhaustive  case.  However,  the  tree  construction  process  starts  from  an 
empty  tree  and  adds  one  node  at  a  time. 

We  initialize  the  tree  with  an  empty  root  node.  We  also  initialize  an  empty  grid  similar  to  the 
one  in  Figure  [43|^left).  To  build  the  nodes  in  the  d  +  1  level,  we  consider  all  the  child  nodes  of 
the  nodes  in  the  d  level.  We  must  consider  the  transitions  (as  specified  in  the  graph)  when  we 
build  this  set  of  child  nodes.  We  randomly  go  through  this  set  of  child  nodes,  and  decide  if  each 
one  should  be  added.  If  the  node’s  corresponding  cell  {xi,  Ui)  (see  Figure [4^ left)  has  less  than  a 
prespecified  number  of  k  nodes  already  in  it,  we  will  add  the  node  and  increment  the  number  of 
nodes  in  that  cell.  We  thereby  limit  the  number  of  nodes  in  each  cell  to  fc.  In  Figure [4^right) ,  k 
is  set  to  100.  We  also  limit  the  total  number  of  nodes  we  built. 
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4.2.3  Scalable  and  Diverse  Randomized-based  Tree 


Motivation.  We  have  already  used  the  “pruned”  tree  in  the  overall  preeomputation  approaeh 
to  show  that  the  eoneept  of  preoomputation  ean  be  useful  [[53ll.  However,  two  issues  remain  with 
the  “pruned”  version  of  the  tree. 

The  first  issue  is  that  the  pruned  trees  are  so  small  (typieally  up  to  5  or  6  depth  levels)  that 
it  is  either  not  possible  or  diffieult  to  use  them  in  real  planning  problems  (requiring  solutions  of 
up  to  50  levels  in  our  experiments).  This  is  also  the  ease  for  previous  work  that  builds  trees  with 
diverse  paths  [fTOl30l.  Furthermore,  these  methods  [fTOll^  do  not  seale  to  the  time  and  memory 
needed  for  preeomputing  trees  of  a  reasonable  size  (ie.  even  just  8  depth  levels  or  greater).  These 
methods  require  a  set  of  paths  to  prune  from,  and  it  is  not  elear  how  we  ean  get  this  set  to  begin 
with.  Building  the  exhaustive  tree  for  even  a  small  depth  level  is  extremely  memory  intensive, 
and  starting  with  a  subset  of  paths  to  prune  from  is  problematie  as  it  is  not  elear  where  this 
subset  eomes  from.  Henee  we  would  like  to  generate  scalable  trees.  By  scalable,  we  refer  to  the 
large  size  of  the  tree,  in  terms  of  the  length  of  the  paths.  These  trees  should  also  seale  to  large 
environments. 

The  seeond  issue  is  that  many  of  the  paths  in  our  pruned  trees  are  still  somewhat  redundant, 
as  they  may  eover  similar  regions  or  they  may  partially  overlap  with  eaeh  other.  We  would 
therefore  like  to  build  trees  with  more  diverse  paths.  By  diverse,  we  intuitively  mean  that  the 
paths  should  be  evenly  seattered  around  the  region  that  they  eover.  To  aehieve  this  intuitive  idea, 
we  provide  a  density  metrie  and  greedily  minimize  this  metrie.  We  empirieally  show  that  this 
simple  idea  leads  to  preeomputed  trees  that  ean  solve  more  randomly-generated  planning  queries 
than  previous  methods  for  building  diverse  trees. 

In  this  subseetion,  we  show  how  to  preeompute  a  tree  that  has  diverse  paths  and  that  ean 
seale  to  large  environments.  Although  our  method  is  simple,  it  has  many  advantages.  We  ean 
preeompute  a  tree  with  a  more  effieient  eomputation  time  than  the  algorithms  that  have  been 
previously  proposed  lITOl  l30ll .  Our  preeomputed  tree  ean  solve  more  planning  queries  than  trees 
built  with  previous  methods,  given  the  same  amount  of  memory  for  storing  the  tree.  We  ean  build 
a  tree  for  any  memory  size  available  for  storing  it,  and  we  ean  build  a  tree  that  ean  eover  a  region 
of  arbitrary  shape  and  size. 

Algorithm.  Our  algorithm  preeomputes  a  seareh  tree  by  inerementally  adding  a  node  and  its 
eorresponding  edge  to  the  tree.  We  use  a  “density”  metrie  to  essentially  seatter  the  edges  or  paths 
of  the  tree  evenly  throughout  the  region  that  we  would  like  to  build  the  tree  in. 

We  use  the  following  notation  in  the  algorithm  below.  Let  A  be  the  set  of  aetions.  Reeall 
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Algorithm  2:  Precomputation  of  Scalable  and  Diverse  Tree 
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function  Trace(e) 

rntemp-lnitO; 
foreach  {x,y)  S  Path{e)  do 
mapx  =  Map{x); 
mapy  =  Map{y); 

mtemp{mapx,mapy)  =  1; 

end 

return  mtemp'. 


function  PrecomputeTree(A,  K,  R) 

T.Init(nroot)\ 

m.lnitQ; 

^overall  ^ 

for  k  =  1  to  K  Ao 

tinear  <—  Nearest{N ,  a{k)); 

=  FLTJvlAX; 
tt'best  —  NULL, 


foreach  aj  e  A  do 

it  —>Transition{nnear-o-ction,  aj)  then  continue; 
it  T. Already Expanded{n„f^ar -childs,  aj)  then  continue; 
e  <—  T.SimulateAddChild(nnear,  aj); 
if  OutsideRegion{R,  Trace(e))  then  continue; 
mcurrent  <-  m  ©  Trace(e); 

AJ  ,  Den3ity(m^^^rcnt)-do„cra,ii  . 

^acurrent  <r-  Length(Traceie)) 

if  Adcurrent  ^  Ady^gi  then 


Adbest  —  Adcurrent; 


^best  —  ’ 

end 


end 

if  a^est  ==  NULL  then  continue  /*  do  not  increment  k  */; 
e  ^  T.AddChild{nnear^  <^best)''> 
m  ■<—  m  0  Trace(e)', 
doverall  Density (m); 

end 

return  T; 


that  we  are  given  whether  or  not  each  action  can  transition  to  other  actions.  For  every  i,  j  (i  can 
equal  j),  if  action  a*  can  transition  to  action  aj,  Transition{ai,  aj)  is  true.  Otherwise,  it  is  false. 
Let  T  be  the  precomputed  tree,  n  be  each  node  of  T,  and  N  be  the  set  of  all  nodes.  Each  node 
corresponds  to  one  action,  denoted  by  n. action,  n.childs  denotes  the  child  nodes  of  node  n.  Let 
e  be  each  edge  of  T.  Every  time  we  add  a  new  node  n  to  T,  we  also  add  a  corresponding  edge 
that  connects  n  and  its  parent  node;  we  refer  to  this  combination  as  a  node/edge.  We  refer  to 
the  path  that  the  action  of  a  node  covers  as  a  traced  path)  more  details  are  given  below  when  we 
describe  the  TraceQ  function.  Let  m  be  a  2D  grid  that  covers  the  region  occupied  by  the  tree. 

Our  algorithm  can  be  summarized  as  follows:  we  iteratively  add  a  node  and  its  corresponding 
edge  to  T.  At  each  iteration,  we  “randomly”  select  which  node  in  the  existing  T  to  expand  from, 
and  then  use  a  density  metric  to  locally  decide  which  child  of  the  selected  node  to  expand.  This 
iterative  strategy  is  greedy  and  leads  to  non-optimal  solution  paths.  However,  since  it  is  not 
possible  to  precompute  large-scale  trees  that  can  provide  optimal  solutions,  we  choose  a  strategy 
that  is  fast  and  provides  near  optimal  solutions  (as  shown  later  in  our  experimental  evaluation). 
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The  intuition  for  the  density  metrie  is  that  sinee  the  tree  is  preeomputed  for  any  obstaele  and  any 
start/goal  queries,  a  simple  but  effeetive  way  to  inerease  the  likelihood  of  finding  a  solution  is  to 
“seatter”  the  paths  evenly  in  the  region  that  T  should  be  built  in. 

We  now  deseribe  Algorithm  in  more  detail.  In  the  PreeomputeTree()  funetion,  K  is  the 
number  of  nodes  to  be  built  in  T,  and  is  a  parameter  that  ean  be  set  depending  on  the  memory 
available  for  storing  the  tree,  i?  is  a  2D  region  that  we  want  to  build  the  tree  in,  and  ean  be 
arbitrarily  large  and  be  in  any  shape.  Note  that  the  obstaeles  and  goal  are  not  taken  into  aeeount 
during  preeomputation.  The  funetion  starts  by  initializing  T  with  a  root  node  (riroot),  whieh  is  a 
plaeeholder  node  that  eontains  no  aetion  and  ean  transition  to  all  other  aetions.  This  root  node  is 
initialized  with  position  (0,  0),  orientation  0  and  total  eost  0.  It  initializes  the  grid  m  by  setting 
all  its  grideells  to  0.  This  grid  provides  a  diseretized  “eount”  of  the  spaee  that  the  tree  eovers. 
doveraii  is  the  density  (deseribed  below)  measure  of  m.  We  inerementally  seleet  a  node/edge  to 
add  to  T.  a{k)  is  a  randomly-sampled  point  in  R,  and  NearestQ  seleets  the  node  in  the  existing 
set  N  that  is  nearest  to  a{k).  This  randomly-sampled  seleetion  seheme  is  the  same  as  in  RRTs  as 
we  explained  in  Chapter  2.  We  then  try  to  add  a  ehild  node  to  rinear'-  we  ehoose  to  assoeiate  this 
ehild  node  with  an  aetion  whose  traeed  path  loeally  minimizes  the  density  measure  if  that  path 
is  added  {lines  14-23}.  The  Already Expanded{)  funetion  eheeks  to  see  if  aj  is  already  a  ehild 
of  Unear-  SimulateAddChild{nnear,  aj)  simulates  the  effeet  of  adding  a  new  node  representing 
Uj  as  a  ehild  node  of  Unear-  It  does  not  add  the  new  node  and  its  eorresponding  edge  to  T 
here;  instead  it  returns  information  about  the  eorresponding  edge  (whieh  is  represented  by  e  on 
line  17).  The  Traee()  funetion  marks  all  the  grideells  eovered  by  Path{e).  Path{e)  (line  2} 
takes  an  edge  e  that  eonneets  a  parent  node  and  a  ehild  node,  and  generates  the  “traeed”  2D 
path  of  motion  if  we  start  from  the  overall  position  at  the  parent  node  and  take  the  aetion  at  the 
ehild  node.  Path{e)  then  returns  a  set  of  diseretized  2D  points  passing  along  this  path.  These 
points  have  to  be  ehosen  so  that  we  neither  generate  too  many  points  and  make  the  algorithm 
ineffieient,  nor  generate  too  few  points  and  have  them  not  eover  all  the  grideells  that  the  path 
eovers.  Map()  {lines  3-4}  maps  from  the  eoordinate  system  of  the  aetion/motion  spaee  to  the 
eoordinate  system  of  the  grid  rritemp-  mtemp  has  the  same  shape  and  grid  strueture  as  m.  To  avoid 
aeeessing  all  the  eells  of  rritemp  in  eaeh  exeeution  of  Trace{),  rritemp  is  initialized  onee  in  the 
algorithm,  and  the  {mapx,  mapy)  points  are  saved  for  reseting  rritemp  eaeh  time.  Onee  we  have 
Trace{e),  OutsideRegionQ  returns  true  if  at  least  one  of  the  grideells  marked  by  Trace{e)  is 
outside  of  R.  R  has  the  same  grid  strueture  as  m.  The  ©  operator  performs  eomponent-wise 
addition  to  the  grids.  Density{m)  takes  a  grid  m  (whieh  does  not  have  to  be  reetangular  shape) 
with  the  “eount”  in  the  grideells  labelled  q  (i  from  1  to  ncells),  and  eomputes  the  “density”  of 
the  paths  in  the  tree: 
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Density{m) 


E 


Y^jCj 

ncells 


(4.1) 


Intuitively,  a  smaller  density  value  means  that  the  paths  are  more  evenly  spread  out.  The  algo¬ 
rithm  loeally  minimizes  this  density  metrie  to  aehieve  this.  This  density  metrie  aetually  measures 
uniform  density,  as  minimizing  this  metrie  corresponds  to  having  paths  that  evenly  or  uniformly 
spread  out.  Length{)  is  the  distance  that  the  “traced”  2D  path  travels.  Since  we  have  discretized 
this  path,  we  compute  the  number  of  gridcells  that  the  discretized  set  of  2D  points  cover.  The 
reason  for  dividing  by  the  length  is  to  normalize  for  the  length  of  the  traced  path  when  consider¬ 
ing  the  density  value.  AddChild{nnear-,  cbbest)  adds  a  new  node  representing  Ubest  as  a  child  node 
of  rinear,  and  also  adds  the  corresponding  edge. 

In  Figure  [4^  we  showed  the  frequency  plots  of  the  exhaustive  and  pruned  tree  cases.  We 
do  not  show  the  plot  of  the  scalable  and  diverse  tree  here  as  the  plot  simply  has  uniform  density 
and  therefore  has  a  constant  color.  Note  that  this  is  the  case  by  construction  as  we  wanted  to 
add  paths  to  the  tree  in  a  such  way  that  they  would  be  evenly  spread  out.  As  we  add  more  nodes 
and  paths  to  the  tree,  the  density  increases  but  the  overall  plot  remains  to  have  approximately 
uniform  density. 


Properties  of  Tree.  We  show  that  our  algorithm  satisfies  several  desirable  properties. 

The  execution  time  of  the  tree  precomputation  is  0{K{log  K  +  ||A||  *  F)).  The  log  K 
term  comes  from  the  nearest  neighbor  computation.  F  is  due  to  a  faster  way  to  compute  the 
Density  (m)  value:  instead  of  iterating  through  all  the  gridcells  of  m,  we  keep  a  frequency  count 
of  the  values  in  each  gridcell  and  compute  Density  (m)  using  this  information.  F  is  the  largest 
value  with  at  least  a  count  of  one;  it  starts  at  0  and  increases  as  k  increases.  F  is  a  function  of  K, 
R,  the  cell  sizes  of  m,  and  A  (the  space  that  each  action  covers).  In  practice,  the  K  *  ||A||  *  F 
term  is  more  significant  than  the  K  log  K  term.  In  our  experiments,  the  largest  K  we  used  is 
about  2e6,  A  is  about  10-20,  and  the  largest  F  we  have  is  about  500. 

Given  a  specific  precomputed  tree,  our  approach  is  not  complete.  However,  we  have  a  weaker 
notion  of  “complete”-ness  with  respect  to  the  given  tree:  if  there  is  a  solution  in  the  precomputed 
tree,  the  algorithm  will  find  it  in  finite  time;  if  there  is  no  solution  in  the  precomputed  tree,  it  will 
stop  and  report  failure  in  finite  time. 

We  now  show  that  given  enough  time  (and  memory),  all  the  nodes  in  the  exhaustive  tree  will 
eventually  be  expanded.  We  define  Exh{d)  to  be  the  exhaustive  tree  with  finite  depth  d  and  finite 
average  branching  factor  b.  We  define  a  notion  called  Probabilistic  Expansion: 
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lim  P  (  rii  will  be  expanded  \  Ui  G  Exh{d)  )  =  1  (4.2) 

k^oo 

where  d  ean  be  arbitrarily  large.  Algorithm  [^follows  this  notion  of  Probabilistie  Expansion. 

Proof:  We  prove  by  eontradietion:  there  is  at  least  one  node,  n,  that  is  not  expanded.  If  n’s 
parent  node  is  not  expanded  (but  exists  in  the  exhaustive  tree),  we  instead  set  n  to  be  its  parent 
node.  We  eontinue  this  until  n’s  parent  node  is  expanded.  We  now  have  an  unexpanded  node  n 
whose  parent  node  p  is  expanded.  We  must  always  at  least  have  sueh  a  ease  beeause  the  tree’s 
root  node  must  be  expanded  at  the  k  =  1  iteration.  Let  /r()  be  the  measure  of  volume  in  a  metrie 
spaee  [1561  and  V (p)  be  the  Voronoi  region  of  p.  We  must  have  p{V (p))  >  0  regardless  of  the 
number  of  nodes  in  the  eurrent  tree  and  k,  sinee  the  tree  has  a  finite  size.  Let  the  branehing  faetor 
of  p  be  b,  whieh  is  finite.  As  k  ^  oo,  we  must  eventually  sample  V (p)  b  times  (reeall  that  we 
only  sample  from  finite  region  R),  and  n  must  be  expanded. 


4.3  Precomputation  of  Environment  and  Goal  Gridmaps 

This  is  the  second  precomputation  step. 

Environment  Gridmap.  The  intuition  for  this  gridmap  is  that  we  will  map  the  obstacles 
to  this  grid  and  thereby  the  tree  during  runtime.  The  discretized  grid  is  then  used  for  efficient 
runtime  collision  checks.  We  build  an  environment  gridmap  over  the  tree  as  shown  in  Ligure 
|4.3Peft).  The  gridcells  are  all  initially  marked  as  UNOCCUPIED.  Each  node  of  the  tree  can  then 
be  associated  with  a  gridcell.  Lor  example,  node  i  corresponds  to  cell  (xj,  yt)  in  Ligure  [Alright). 
We  precompute  and  explicitly  store  the  corresponding  {xi,yi)  value  in  each  node  so  that  we  can 
quickly  access  the  cell  that  a  node  is  in  during  runtime.  This  is  important  for  the  efficiency  of  the 
algorithm. 

The  size  of  the  gridcells  is  a  parameter  of  the  system,  and  we  used  a  range  of  sizes  from  14 
cm  to  28  cm.  This  parameter,  however,  affects  the  runtime  phase  significantly.  It  may  increase 
the  time  for  mapping  the  environment  to  the  tree  (Section  [4!4|)  by  approximately  nine  if  the  size 
of  each  cell  is  cut  to  a  third  of  the  original.  We  want  to  balance  between  a  large  cell  size  which 
decreases  the  runtime,  and  a  small  cell  size  which  represents  the  environment  more  accurately. 

Lor  each  tree  node  i,  we  also  precompute  and  store  the  values  Xmidumej  and  ymidumej  (Ligure 
|4.3|  right).  We  first  take  half  (with  respect  to  time  duration)  of  motion  clip  i  to  reach  node 
midtimeJ.  The  values  Xmidume.i  and  ymidtime.i  are  then  stored  in  node  i.  Node  midtimeJ  is 
used  temporarily  in  this  calculation  and  does  not  exist  in  the  tree.  If  the  motion  clip  is  one  of  the 
special  motions  such  as  jumping,  we  do  not  take  half  of  the  clip  to  get  node  midtimeJ.  Instead 
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Figure  4.3:  Left:  An  environment  gridmap  initialized  with  UNOCCUPIED  eells.  The  intuition 
for  this  gridmap  is  that  if  eell  (xj,  pi)  is  oeeupied  by  an  obstaele,  the  tree  nodes  eorresponding  to 
this  eell  and  their  deseendant  nodes  (the  blaek  ones)  are  BLOCKED.  Right:  For  eaeh  node  i,  we 
preeompute  and  store  the  eorresponding  values  Xi,yi,  XmidUmeA,  and  y  midtime  a- 

we  use  the  point  where  the  speeial  motion  is  aetually  exeeuted.  For  the  example  of  jumping,  this 
is  where  the  eharaeter  is  at  or  near  the  highest  point  of  its  jump.  This  information  should  already 
be  pre-annotated  in  the  motion  data.  These  “midtime”  positions  are  used  for  eollision  eheeking 
in  the  runtime  phase.  For  the  speeial  motions,  they  are  used  to  see  if  the  eharaeter  sueeessfully 
passes  through  the  eorresponding  speeial  obstaele.  The  ehoiee  of  taking  half  of  the  elip  is  only 
a  diseretization  parameter.  For  more  aeeurate  eollision  eheeking,  we  ean  eontinue  to  split  the 
elip  and  eompute  similar  information.  We  ehoose  only  the  “midtime”  diseretization  beeause  our 
motion  elips  are  short  in  length  (henee  midtime  in  enough),  and  a  smaller  diseretization  gives  a 
faster  runtime. 

Goal  Gridmap.  The  intuition  for  the  goal  gridmap  is  that  we  want  to  start  searehing  with  the 
lowest  eost  path  first  during  runtime.  We  preeompute  a  goal  gridmap  (Figure  |4^  left)  used  in 
the  runtime  baekward  seareh  phase  (Seetion[43]).  For  every  node  in  the  tree,  we  plaee  it  in  the 
eorresponding  eell  in  the  gridmap.  Eaeh  eell  then  eontains  a  sorted  list  of  nodes  (Figure [A4|left). 
We  used  a  range  of  eell  sizes  from  45  to  90  em.  The  environment  gridmap  does  not  explieitly 
store  this  list  of  nodes,  but  the  goal  gridmap  does. 

Figure  |4.4(left)  shows  why  a  straight-forward  diseretization  of  the  goal  gridmap  may  not 
work  well.  In  this  ease,  the  node  representing  the  “best  nearby  path”  is  elose  to  the  goal.  How¬ 
ever,  it  was  plaeed  into  a  nearby  grideell  beeause  of  the  diseretization  of  the  spaee.  Another  path 
is  returned  as  the  solution  even  though  in  some  eases  the  “best  nearby  path”  is  elearly  better. 
This  happens  when  the  goal  is  elose  to  the  boundary  of  the  diseretization.  The  goal  gridmap 
itself  eannot  be  adjusted  or  ehanged  easily  during  runtime  beeause  it  is  preeomputed  and  we  do 


51 


goal 


X; 


*^overlap 


Figure  4.4:  Left:  In  the  goal  gridmap,  each  cell  contains  a  sorted  list  of  paths.  Each  path’s  total 
cost  is  the  sum  of  the  cost  of  the  motion  states.  The  sorting  is  based  on  this  total  cost.  Since 
each  node  in  the  tree  corresponds  to  a  unique  path  if  we  trace  the  node  back  towards  the  root 
of  the  tree,  we  can  also  say  that  each  goal  cell  contains  a  sorted  list  of  nodes.  We  will  use  this 
gridmap  during  runtime;  the  intuition  is  that  if  we  know  the  gridcell  that  the  goal  position  is  in, 
the  paths  or  nodes  in  that  cell  correspond  to  the  potential  solutions.  Right:  A  straight-forward 
discretization  of  the  goal  gridmap  may  not  work  well.  An  “overlapped  discretization”  works 
well. 


not  know  the  goal  position  beforehand.  Figure |4^right)  shows  our  solution  to  this  problem.  For 
each  cell  (x,,  2/*),  we  extend  the  size  of  the  cell  from  dorig  to  doveriap-  We  then  place  the  nodes  of 
the  tree  that  are  in  the  extended  cell  into  cell  {xi,  Ui).  This  means  that  some  of  the  nodes  will  be 
placed  into  more  than  one  cell.  Our  values  of  dorig  are  between  45  and  90  cm,  and  doveriap  are 
between  105  and  210  cm. 


4.4  Mapping  Obstacles  to  Environment  Gridmap 

This  is  the  first  main  runtime  step.  The  precomputed  tree  and  the  environment  are  in  different 
coordinate  spaces.  We  must  first  align  these  spaces.  We  do  not  transform  the  tree  towards  the 
space  of  the  environment  because  there  is  much  more  information  in  the  tree  and  this  would 
be  more  costly.  Instead,  we  translate  and  rotate  the  starting  position  (of  the  character  in  the 
environment)  to  match  the  root  node  of  the  precomputed  tree,  and  the  starting  orientation  (of  the 
character)  to  face  the  forward  direction  of  the  tree’s  root  node  (Figure [43] left).  All  the  obstacles 
in  the  environment  are  translated  and  rotated  similarly. 

We  then  map  each  transformed  obstacle  to  the  environment  gridmap.  We  want  to  mark  each 
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Figure  4.5:  Left:  We  align  the  eoordinate  spaees  between  the  environment  and  the  tree.  We 
translate  and  rotate  the  obstaeles  and  the  goal  position  so  that  the  starting  position  and  orientation 
(of  the  eharaeter  in  the  environment)  mateh  with  that  of  the  preeomputed  tree.  Right:  If  the  size 
of  the  grideell  is  d,  we  ean  guarantee  that  the  mapping  of  an  obstaele  to  the  environment  gridmap 
is  eorreet  if  the  sampling  of  points  for  the  obstaele  is  at  most  d/\/2  apart. 


eell  of  the  environment  gridmap  as  either  OCCUPIED  or  as  a  valid  region  of  a  speeial  obstaele. 
If  an  obstaele  is  outside  the  region  eovered  by  the  tree,  we  ean  safely  ignore  it.  Otherwise,  we 
map  it  to  the  environment  gridmap  by  iterating  through  a  diseretized  set  of  points  inside  the 
obstaele  (Figure |4.5| right) . 

If  a  grideell  in  the  environment  gridmap  is  OCCUPIED,  we  know  that  the  tree  nodes  in  that 
eell  are  BLOCKED.  But  in  order  to  save  time,  we  will  not  mark  them  as  sueh  until  it  is  neeessary 
to  do  so  in  the  runtime  baekward  seareh  step.  In  addition,  the  indiees  of  eaeh  grideell  that  gets 
marked  as  being  oeeupied  are  saved  as  the  mapping  proeeeds.  We  use  this  information  to  quiekly 
reset  the  environment  gridmap  every  time  we  re-exeeute  this  mapping  proeess. 


4.5  Runtime  Backward  Search 

This  is  the  seeond  main  runtime  step.  If  the  final  goal  is  within  the  region  eovered  by  the  tree, 
we  ean  exeeute  the  runtime  path  finding  algorithm  just  onee.  If  the  final  goal  is  further  away,  we 
first  use  the  bitmap  planner  (Seetion[43|)  to  generate  a  eoarse  solution  path.  We  then  repeatedly 
use  this  eoarse  path  to  seleet  sub-goals,  whieh  are  used  in  repeated  iterations  of  the  runtime  path 
finding  algorithm. 

Sub-goal  Selection.  This  part  is  only  neeessary  if  we  use  the  eoarse  bitmap  planner.  The 
eoarse  level  path  has  many  points  that  we  ean  use  as  sub-goals.  Intuitively,  we  would  like  to  find 
ones  that  will  be  within  the  dark  red  regions  (Figure [4!2l)  of  the  preeomputed  tree.  We  ehoose  the 
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Algorithm  3:  Runtime  Path  Finding 


1 


2 


3 


GoalcoalGrid  ^  T (Goalciobal) 

P  ^  Go&\GnA[GoalGoaiGrid-x\  [GoalGoaiGrid-y]-^odesO 

foreach  p  e  P  do 

while  ( (p.BLOCKED  == false)  and 

(EnvGrid[p.Xi][p.yi]  ==  UNOCCUPIED)  and 
(p  !=  rootNode) )  do 
p.BLOGKED  ^  true 
//  midtime  collision  check 
if  isSpecialMotion(p.motionState)  then 

if  EllvGridmap [p .Xjiiidtime-i] [P 'ymidtime_i ]  •  ~ 

specialObstacle( p.motionState )  then 
continue  to  next  p 

end 

else 

if  EllvCridinap [p .Xijiidtime_i ] [p -ymidtimej ] - 

OCCUPIED  then 
continue  to  next  p 

end 

end 

p  -G-  p.parent 

end 


4 

5 

6 


//  this  path  traced  through  before 

if  p.BLOCKED  ==  true  then  continue  to  next  p 

//  this  path  is  blocked  by  an  obstacle 
it EnvGridmap[p.Xi][p.yi]  ==  OCCUPIED  then 
p.BLOGKED  G-  true 
continue  to  next  p 

end 


//  reached  rootNode  and  path  found 

7  return  node  representing  current  path 

end 


return  no  path  found 


sub-goal  to  be  the  point  in  the  coarse  path  that  is  closest  to  a  fixed  distance  away  from  the  start 
(Figure  [4^.  A  distance  between  10  and  12  meters  worked  well  in  our  examples.  Note  that  the 
start  is  different  for  each  iteration  of  the  runtime  path  finding  phase. 

Runtime  Path  Finding  Algorithm.  Algorithm  takes  a  goal  position  as  input,  and  returns 
the  tree  node  that  represents  the  solution  path.  If  there  is  no  possible  path  in  the  precomputed 
tree  that  can  reach  the  goal,  it  will  recognize  that  there  is  no  solution  (among  the  possible  paths 
in  the  tree).  The  inputs  also  include  the  precomputed  tree,  the  environment  gridmap,  and  the 
goal  gridmap.  The  goal  position  is  first  translated  and  rotated  from  its  global  coordinates  to  the 
coordinate  system  of  the  goal  gridmap  (function  T).  The  transformed  indices  are  used  to  find  P, 
the  list  of  nodes  sorted  in  increasing  cost.  We  then  go  through  each  node  p  in  P,  and  try  to  trace 
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Figure  4.6:  The  2  columns  correspond  to  the  first  2  iterations  of  the  runtime  path  finding  phase 
for  this  example.  The  top  row  shows  the  start  (green  sphere)  in  each  iteration,  and  the  sub-goal 
(red  sphere)  selected  from  the  coarse-level  path.  The  bottom  row  shows  the  path  returned  by  the 
runtime  path  finding  algorithm  (light  and  dark  blue)  and  the  partial  path  chosen  (dark  blue  only). 
An  estimate  of  the  outline  of  the  precomputed  tree  is  shown.  The  tree  is  transformed  to  the  global 
space  only  in  the  figure  to  show  how  it  relates  to  the  other  parts  of  the  environment.  There  is 
only  one  precomputed  tree,  and  it  is  never  transformed  to  the  global  space  in  the  algorithm. 

it  back  towards  the  root  node  (which  is  where  the  start  is  in  the  current  iteration).  As  we  trace 
back  towards  the  root,  we  mark  each  node  as  BLOCKED,  if  it  is  not  already  BLOCKED  or  not 
obstructed  by  an  obstacle.  The  intuition  behind  this  is  that  we  want  to  find  the  shortest  path  that 
is  not  obstructed  in  any  way.  We  also  check  to  see  that  the  “midtime”  point  of  the  motion  clip 
reaching  that  node  is  not  obstructed  (line  3)  before  tracing  back  to  its  parent  node.  Furthermore, 
if  we  have  arrived  at  that  node  through  a  special  motion  (line  2),  we  check  to  see  if  the  motion 
successfully  goes  through  the  corresponding  special  obstacle  by  checking  to  see  if  the  “midtime” 
point  is  a  valid  region  of  that  type  of  special  obstacle.  The  specialObstacleQ  function  returns 
the  type  of  this  corresponding  obstacle.  If  the  “midtime”  point  is  obstructed  in  any  way,  the 
algorithm  will  continue  to  the  next  possible  node  in  P. 

There  are  three  conditions  under  which  each  trace  of  node  p  towards  the  root  node  stops 
(Figure  |477|): 

1.  Pointer  p  arrives  at  a  node  that  is  obstructed  by  an  obstacle  (case  1  in  Figure  [477]^ top)  and 
line  5  in  Algorithm]^.  When  this  happens,  the  path  from  the  root  node  to  p  cannot  be  a 
solution.  We  mark  that  node  as  BLOCKED  (these  are  the  black  nodes  in  Figure  |4.7|  and 
also  the  ones  marked  BLOCKED  in  line  6)  and  proceed  to  test  the  next  node  in  P. 
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Figure  4.7:  The  proeess  of  traeing  baek  the  list  of  sorted  nodes  P  towards  the  root  node  in 
Algorithmic  Left:  The  3  eases  under  whieh  eaeh  traee  of  node  p  stops.  The  sub-goal  is  inside 
the  dashed  square  (a  eell  of  the  goal  gridmap).  Right:  Simple  example.  The  blue  nodes  are 
the  nodes  of  the  preeomputed  tree.  The  sub-goal  is  somewhere  in  the  square-shaped  box  of  red 
nodes.  The  other  eolored  nodes  eorrespond  to  the  3  eases. 


2.  Pointer  p  arrives  at  a  BLOCKED  node  (ease  2  and  line  4).  When  this  happens,  the  path 
from  the  root  node  to  p  also  eannot  be  a  solution.  The  algorithm  then  eontinues  to  test  the 
next  node  in  P.  The  red  nodes  in  Figure  [477]  are  the  nodes  that  were  traeed  baek.  These  are 
the  ones  that  are  marked  as  BLOCKED  in  line  1  of  Algorithmic 

3.  Pointer  p  arrives  at  the  root  node  (ease  3  and  line  7).  The  green  nodes  eorrespond  to  the 
nodes  traeed  baek  for  this  p.  Note  that  these  are  also  marked  as  BLOCKED,  but  it  does  not 
matter  sinee  we  already  have  the  solution.  We  stop  testing  the  list  of  nodes  P,  and  return 
the  original  node  that  p  points  to  in  this  ease  (the  darker  green  node  in  Figure  |4.7|  top) 
to  represent  the  solution  path.  If  we  have  gone  through  the  whole  list  P  without  having 
reaehed  the  root  node,  there  is  no  possible  solution. 


The  solution  path  from  the  algorithm  is  in  the  eoordinate  system  of  the  preeomputed  tree.  We 
must  therefore  transform  (T“^)  eaeh  node  in  the  path  baek  to  the  global  eoordinate  system.  The 
bottom  row  of  Figure  [47^  shows  examples  of  the  algorithm’s  output. 

Before  running  a  new  iteration  of  the  path  finding  phase,  we  need  to  UNBLOCK  all  the  nodes 
of  the  preeomputed  tree.  As  we  run  Algorithm  [C  we  save  the  number  of  nodes  in  P  that  were 
examined.  To  “reset”  the  tree,  we  go  through  the  same  number  of  nodes  in  P  again  (in  the  sorted 
order).  For  eaeh  node  p,  we  traee  the  path  baek  towards  the  root  node  and  UNBLOCK  every  node 
along  the  path,  stopping  when  p  arrives  at  an  already  UNBLOCK-ed  node.  We  are  therefore  able 
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to  only  traverse  the  nodes  that  were  BLOCK-ed. 


Properties  of  Runtime  Path  Finding.  The  exeeution  time  of  this  algorithm  is  0{Ngoaii^^g^^t  * 
diargest)-  ^goak^rgeat  ^he  largest  number  of  nodes/paths  in  one  grideell  among  all  the  eells  of 
the  goal  gridmap,  given  a  preeomputed  tree.  It  is  typieally  in  the  thousands,  and  up  to  a  few  tens 
of  thousands,  diargest  is  the  largest  depth  in  the  given  preeomputed  tree.  It  typieally  lies  between 
ten  and  fifty.  This  exeeution  time  explains  the  effieieney  of  the  runtime  seareh. 

The  runtime  baekward-traeing  is  a  “lazy”  way  to  diseover  nodes  that  eannot  be  reaehed  for  a 
given  goal  and  obstaele  eonfiguration.  In  addition,  the  runtime  seareh  eheeks  through  the  smallest 
number  of  nodes  with  respeet  to  a  speeifie  preeomputed  tree  and  a  planning  query,  if  we  use  a 
baekward  seareh  approaeh.  This  is  beeause:  (i)  we  only  need  to  seareh  through  the  nodes/paths 
that  are  in  the  eell  of  the  goal  gridmap  that  the  goal  is  in;  (ii)  we  have  to  eheek  the  least  eost 
nodes/paths  first  beeause  we  want  to  return  the  least  eost  solution;  and  (iii)  we  must  go  through 
at  least  the  nodes  in  eases  1  and  2  of  the  runtime  seareh  (the  red  nodes  in  Figure  |4.7[)  to  eheek 
for  potential  obstaele  eollisions.  The  validity  of  the  third  point  is  not  obvious.  Reeall  that  we 
first  map  the  obstaeles  to  the  region  eovered  by  the  tree  during  runtime.  At  that  point,  we  eould 
have  identified  all  the  nodes  that  are  bloeked  by  the  obstaeles,  and  the  deseendant  nodes  of  these 
nodes  sinee  they  are  also  bloeked  by  the  obstaeles  (we  eannot  get  to  these  deseendant  nodes  from 
the  root  of  the  tree).  Instead  we  diseover  these  bloeked  nodes  lazily,  by  applying  eases  1  and  2 
of  the  runtime  seareh.  The  nodes  that  these  two  oases  go  through  must  be  at  least  less  than  the 
total  number  of  “bloeked”  nodes.  Finally,  it  is  possible  that  a  forward  seareh  (from  start  towards 
goal)  oan  lead  to  a  smaller  number  of  nodes  we  have  to  eheek.  However,  our  baokwards  seareh 
approaeh  is  simpler  to  implement  beeause  there  is  only  one  unique  path  towards  the  tree’s  root 
as  we  move  from  the  goal  baek  towards  the  start.  Henee  we  only  need  to  keep  a  pointer  to  the 
parent  node  of  eaeh  node,  and  move  one  pointer  along  the  nodes  to  eheek  through  them. 


Partial  Paths.  This  part  is  only  neeessary  if  we  use  the  eoarse  bitmap  planner.  In  Seetion 


4.6  we  annotate  the  points  in  the  eoarse-level  path  that  appear  just  before  the  speeial  obstaeles 


and  the  final  goal.  These  annotations  are  used  to  allow  the  eharaeter  to  not  get  too  elose  to  these 
obstaeles  or  the  final  goal  before  re-exeeuting  the  runtime  phase.  Intuitively,  if  there  is  a  large 
obstaele  just  beyond  the  eurrent  planning  horizon,  the  runtime  algorithm  may  generate  a  path 
that  allows  the  eharaeter  to  move  just  before  this  obstaele.  In  the  next  iteration,  it  may  be  too 
elose  to  it  that  there  is  nowhere  to  go  given  the  motion  eapabilities  of  the  eharaeter.  To  avoid  this 
issue,  we  keep  only  a  part  of  the  solution  path  so  that  eaeh  iteration  oan  better  adjust  to  the  global 
environment. 

More  speoifioally,  if  the  eurrent  sub-goal  is  not  annotated  as  being  near  a  speeial  obstaele  or 
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the  final  goal,  we  keep  the  whole  solution  path.  Otherwise,  we  only  keep  the  first  two  nodes  of 
the  solution  path  (Figure  [4^  bottom) .  In  addition,  if  the  solution  path  ineludes  a  speeial  node, 
we  take  all  the  nodes  up  to  and  ineluding  it.  This  is  an  optional  adjustment  that  again  helps  the 
character  in  adjusting  itself  before  executing  a  special  motion.  Our  experience  shows  that  without 
this  adjustment,  the  algorithm  may  sometimes  inaccurately  report  that  there  is  no  solution. 

Furthermore,  we  have  to  make  sure  that  the  last  motion  state  of  the  path  we  keep  and  the  root 
node  of  the  precomputed  tree  can  transition  to  the  same  states.  If  this  is  not  the  case,  we  need 
to  add  an  additional  state  or  leave  out  the  last  one.  This  is  because  the  next  iteration  of  the  path 
finding  phase  uses  the  same  precomputed  tree  and  therefore  starts  at  the  root  node.  Since  the 
majority  of  our  motion  states  transition  to  each  other,  this  is  not  a  major  concern. 

Motion  Synthesis.  The  path  finding  phase  eventually  returns  a  sequence  of  motion  states  that 
allow  the  character  to  navigate  from  the  start  to  the  goal.  This  sequence  is  converted  to  character 
motion  by  concatenating  together  the  motion  clips  that  represent  the  states.  For  the  frames  near 
the  transition  points  between  states,  we  linearly  interpolate  the  root  positions  and  apply  a  smooth- 
in/smooth-out  slerp  function  to  the  joint  rotations.  The  joint  rotations  are  originally  expressed 
as  euler  angles.  They  are  converted  into  quaternions,  interpolated  with  the  slerp  function,  and 
converted  back  into  euler  angles. 


4.6  Coarse-Level  Planner 


If  the  goal  position  lies  within  the  space  covered  by  the  precomputed  tree,  we  can  apply  the  run¬ 
time  path  finding  module  once  and  immediately  find  a  solution  path.  However,  the  precomputed 
tree  has  a  finite  size  and  the  goal  position  can  be  further  away  than  the  region  covered  by  the 
tree.  Hence  in  general,  we  first  use  a  fast  bitmap  planner  to  generate  a  coarse-level  path  from  the 
start  to  the  goal.  This  path  is  then  used  as  a  guideline  for  picking  sub-goals  to  run  each  iteration 
of  the  runtime  path  finding  phase.  In  our  implementation,  we  used  a  bitmap  planning  algorithm 
[|48]|  optimized  for  2D  grids. 

A  coarse-level  map  of  the  environment  is  used  as  input  to  the  bitmap  planner.  The  size  of 
the  gridcells  was  about  70  cm.  We  map  the  obstacles  to  this  coarse  gridmap  using  the  technique 


discussed  in  Section  4.4  In  particular,  we  only  use  the  regular  obstacles  in  this  step,  and  not 
the  special  ones  (ie.  ones  that  the  character  has  to  jump  over).  A  path  is  returned  that  may  go 
through  these  special  obstacles.  This  is  fine  since  the  character  has  the  motion  capabilities  (ie. 
jump)  to  go  through  these  obstacles  when  we  perform  the  runtime  path  finding  step. 

The  special  obstacles  are  then  added  to  the  coarse-level  map.  We  can  eliminate  parts  of  the 
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Perspective  View 


Figure  4.8:  The  points  of  the  path  that  are  eventually  ehosen  by  the  eoarse-level  planner  for  this 
environment. 


returned  path  that  eollide  with  these  obstaeles.  In  addition,  we  annotate  the  parts  of  the  path 
that  appear  just  before  the  speeial  obstaeles,  and  the  parts  that  appear  just  before  the  final  goal 
position  (but  not  the  final  goal  position).  Figure [4^ shows  an  example  of  the  points  in  the  path 
that  are  eventually  ehosen.  Note  that  in  this  example,  there  is  an  obstaele  that  the  eharaeter  must 
duek  under,  and  one  that  it  must  jump  over. 


4.7  Evaluation 

First,  we  demonstrate  the  effeetiveness  of  our  algorithm  by  building  an  interaetive  system  where 
multiple  eharaeters  ean  navigate  and  respond  to  user  ehanges  to  the  obstaeles  and  goal  positions. 
Seeond,  we  empirieally  eompare  our  sealable  and  diverse  tree  to  other  previous  tree  preeompu- 
tation  methods.  Third,  we  empirieally  eompare  our  overall  preeomputation  approaeh  and  tradi¬ 
tional  forward  seareh  methods.  Fourth,  we  explore  the  effeet  of  different  grid  resolutions  on  the 
runtime  eost  and  sueeess  rate  of  finding  a  solution. 

4.7.1  Interactive  System 

We  deseribe  our  system  and  additional  issues  we  have  to  handle  for  synthesizing  multiple  ehar¬ 
aeters.  The  motions  for  multiple  eharaeters  are  generated  in  real-time.  We  do  not  generate  the 
full  path  of  eaeh  eharaeter  at  the  beginning.  For  eaeh  eharaeter,  we  exeeute  a  “runtime  path  find¬ 
ing”  phase  to  synthesize  the  next  partial  path  only  after  we  start  rendering  the  first  frame  from 
the  previous  partial  path.  Henee  the  eharaeters  ean  re-plan  and  respond  to  user  ehanges  to  the 
environment  as  the  simulation  is  running. 

Algorithm  shows  the  pseudoeode  of  the  planning  and  rendering  system.  We  speeify  a 
maximum  planning  time  (Tpianning),  and  plan  as  many  eharaeters  as  possible  within  eaeh  iteration 
of  the  draw  loop.  The  exeeption  is  that  we  plan  for  every  eharaeter  onee  when  we  exeeute  the 
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Algorithm  4:  Multiple  Character  Interactive  System 


1 

2 


Initialize  environment 
Precompute  tree  and  gridmaps 
//  draw  loop 
while  true  do 
Read  Input 
if  input  detected  then 
I  Update  environment  state 
end 

//  planning  loop 

while  current  planning  time  <  Tpianning  do 

Advance  to  next  character 

if  all  characters  planned  in  current  planning  loop  then 
I  Stop  current  planning  loop 

end 

if  ready  to  plan  current  character  then 
Generate  next  partial  path/motion 
Store  results  in  buffer 

end 


end 

foreach  character  do 

I  Draw  pose  from  buffer  data  depending  on  time 

end 


end 


planning  loop  for  the  first  time.  We  are  ready  to  plan  the  current  character  (line  1  of  Algorithm 
1^  after  we  start  rendering  the  first  frame  (of  the  eurrent  eharaeter)  from  the  previously  planned 
partial  path. 

To  generate  the  next  partial  path/motion  (line  2),  we  exeeute  the  runtime  path  finding  phase. 
We  run  the  eoarse-level  planner  with  the  updated  starting  loeation  as  the  end  of  the  last  partial 
path.  The  sub-goal  seleetion  and  runtime  iteration  is  done  just  onee,  sinee  we  only  need  the 
first  partial  path  here.  We  use  the  same  precomputed  tree  for  all  the  characters,  so  we  reset  the 
gridmaps  and  precomputed  tree  after  each  character’s  runtime  iteration. 

We  also  precompute  the  blending  frames.  The  characters’  poses  are  blended  at  the  transition 
points  between  motion  clips.  We  precompute  the  blending  frames  for  all  possible  pairs  of  motion 
clips  so  we  can  efficiently  use  them  at  runtime.  We  place  the  correctly  blended  poses  in  the  data 
buffer  as  we  store  the  results. 

In  addition,  we  need  to  deal  with  collision  avoidance  between  characters.  We  apply  these 
3  steps.  First,  we  use  a  global  characters  gridmap  to  store  the  time-indexed  global  positions 
of  the  characters  after  their  poses  are  stored  to  the  data  buffer.  These  positions  are  placed  into 
the  correct  bucket  in  the  gridmap  for  efficient  access  later.  Second,  in  addition  to  mapping 
the  obstacles  to  the  environment  gridmap  (Section  [4^,  we  map  the  characters’  positions  to  a 
local  characters  gridmap  using  a  similar  procedure.  During  the  runtime  iteration,  we  select  the 
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Figure  4.9:  Screenshot  of  the  interactive  system.  The  characters  interactively  respond  to  user 
changes  to  obstacles  and  their  respective  goal  locations  while  navigating  in  a  large  environment. 

cells  in  the  global  characters  gridmap  that  are  relevant  in  each  local  sub-case.  This  assures  that 
the  collision  check  between  characters  is  linear  in  the  number  of  characters,  instead  of  being 
quadratic  in  the  naive  case.  The  positions  are  also  placed  into  the  appropriate  bucket  in  the  local 
gridmap.  Third,  as  we  trace  through  the  tree  nodes  in  Algorithm]^  we  check  to  see  if  each  node 
can  collide  with  the  locally  relevant  characters.  An  additional  test  is  needed  in  the  while  loop 
that  performs  a  Euclidean  distance  check  between  the  node’s  position  and  each  of  the  relevant 
characters’  position.  With  the  use  of  the  local  characters  gridmap,  this  step  is  fast  because  we 
rarely  have  to  perform  a  distance  check. 

Each  iteration  of  the  runtime  phase  takes  about  8.5  ms.  The  fast  runtime  allows  us  to  build  an 
interactive  application.  Our  interactive  system  (Eigure|4^  demonstrates  the  following  strengths 
of  our  method:  (i)  multiple  characters  following  navigation  goals  and  avoiding  obstacles  that 
the  user  can  interactively  modify;  (ii)  the  ability  to  incorporate  behaviors  such  as  jumping  and 
ducking,  so  that  the  characters  are  not  limited  to  navigating  on  a  flat  terrain;  and  (iii)  interactive 
motion  synthesis  of  up  to  150  human-like  characters. 

4.7.2  Comparison  of  Tree  Precomputation  Methods 

As  there  has  been  recent  work  on  the  topic  of  path  diversity,  we  explore  these  algorithms  and 
experimentally  compare  them  with  our  tree  precomputation  method.  We  compare  our  algorithm 
(Algorithmic  with  four  previous  methods  (including  our  original  PST  method).  The  key  here  is 
to  only  compare  the  trees  that  are  built.  We  use  the  same  runtime  backward  search  (our  method) 
with  the  different  trees  to  solve  planning  queries.  This  is  because  the  previous  methods  only  de¬ 
scribe  how  to  build  trees,  but  not  how  to  use  them  to  solve  planning  queries.  Eor  the  experiments 
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14 

7 

5 

66,270 

23,059 

6,960 

3,298 

102.19 

2.28 

Branickv  et  al.  I-E 

35.33 

26.22 

21.84 

10.79 

7.76 

4.72 

780 

138 

49 

21 

11 

9 

120,155 

42,329 

13,347 

5,309 

121.87 

20.13 

Green  and  Kell^SOl 

68.89 

66.27 

62.48 

57.93 

48.82 

31.87 

11460 

1800 

480 

80 

19 

6 

55,019 

16,526 

4,285 

1,841 

112.02 

8.22 

Table  4. 1 :  Comparison  of  Tree  Preeomputation  Methods. 


SPST 

original  PST 

Branicky  et  al.  (2008)  I-P 

Branicky  et  al.  (2008)  I-E 

Green  and  Kelly  (2007) 
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Figure  4.10:  Examples  of  precomputed  trees  used  in  our  comparison.  All  trees  have  the  same 
number  (826)  of  nodes.  Each  tree’s  root  is  at  (0,0),  and  the  paths  move  in  a  forward  (or  up  in  the 
figure)  direction  because  the  input  actions/motions  allow  the  character  to  move  forward  and/or 
slightly  turn  left/right.  Note  that  many  paths  overlap  because  of  the  tree’s  structure. 


in  this  subsection,  we  are  only  able  to  build  trees  that  fit  in  a  relatively  small  environment.  This 
is  because  it  is  not  clear  how  we  can  use  the  previous  methods  to  build  large  trees. 

We  generate  a  large  number  of  random  planning  queries  and  try  to  use  the  trees  precomputed 
with  the  different  methods  to  solve  them.  We  select  a  fixed  starting  position  and  orientation,  and 
generate  random  goal  positions.  This  is  equivalent  to  generating  random  start/goal  queries.  Since 
we  build  an  exhaustive  tree  with  5  depth  levels  with  which  to  select  paths  from  for  the  purposes 
of  some  of  the  other  tree  building  methods,  these  methods  can  only  solve  queries  within  the 
region  covered  by  the  exhaustive  5-level  tree  (we  let  R  be  this  region  for  SPST,  which  explains 
the  tree’s  shape  for  SPST  in  Pigure[4~T0]).  Hence  we  select  random  goal  positions  within  R  so 
we  can  perform  a  fair  comparison.  We  generate  obstacles  randomly  by  randomly  generating  the 
number  of  obstacles,  the  positions  and  orientation  of  each  one,  and  the  sizes  of  each  one  given 
that  they  have  a  rectangular  shape.  Each  obstacle  must  at  least  overlap  with  R.  We  use  the  same 
set  of  random  queries  for  all  of  the  methods;  we  did  not  include  queries  where  the  start  and/or 
goal  collide  with  obstacles. 

In  Table|4.1|  all  the  methods  use  the  same  runtime  backward  search  technique  (our  technique 
described  above),  since  the  last  three  methods  only  provide  algorithms  to  build  the  tree;  the 
difference  is  in  the  tree  precomputation  technique.  SPST  is  the  Scalable  and  diverse  version  of 
our  tree,  “original  PST”  is  the  technique  in  Eau  and  Kuffner  [l53ll.  “I-P”  stands  for  Inner- Product 
and  “I-E”  stands  for  Inclusion-Exclusion.  Eor  the  last  three  methods  in  the  table,  we  first  built 
the  exhaustive  tree  with  5  depth  levels  and  then  selected  a  subset  of  paths  using  each  method. 
Note  that  SPST  can  have  paths  with  depth  levels  larger  than  5;  for  the  last  three  methods,  the 
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exhaustive  tree  with  larger  depth  levels  eannot  be  built  beeause  of  its  size,  and  it  is  not  elear  how 
to  piek  a  subset  of  potential  longer  (than  depth  5)  paths  to  ehoose  from.  For  all  methods,  we  tried 
to  ehoose  parameters  that  give  the  best  results.  We  build  trees  with  varying  sizes:  the  numbers 
in  the  top  row  are  the  memory  in  KB  that  we  use  to  store  the  tree.  We  use  the  same  amount  of 
memory  to  store  eaeh  node  of  the  tree  for  all  methods,  so  the  trees  for  eaeh  eolumn  has  the  same 
number  of  nodes.  “%  sueeess”  is  the  %  of  the  1186  total  planning  queries  that  ean  be  solved.  We 
also  tried  to  solve  this  set  of  queries  with  the  exhaustive  tree  of  5  depth  levels.  It  took  about  2 
MB  to  store  this  tree  and  the  %  sueeess  rate  was  71.16.  The  pereentages  for  SPST  ean  be  higher 
than  71.16  sinee  the  preeomputed  trees  for  SPST  ean  have  paths  longer  than  5  depth  levels.  The 
“preeomputation  time”  is  the  time  for  building  the  trees  only.  We  used  a  2.4  GHz  maehine  with 
1  GB  of  RAM.  The  “density  value”  is  from  the  Density^)  formula.  The  “path  eost”  eolumns  are 
for  the  50  KB  ease;  we  have  similar  results  for  the  other  eases.  We  took  the  queries  (248  of  them) 
where  all  methods  found  a  solution  and  eompared  the  eosts  of  these  solutions.  We  normalize  the 
eosts  for  the  exhaustive  tree  ease  (the  optimal  ease)  to  be  100,  and  normalize  the  other  eosts 
eorrespondingly.  We  then  eomputed  the  mean  and  standard  deviation  of  all  the  normalized  eosts 
for  eaeh  method  (so  100  %  is  optimal). 


The  results  show  that,  based  on  the  %  sueeess  rates,  the  ranking  of  the  methods  starting  with 
the  best  is:  SPST,  Green  and  Kelly  Ii30l.  Branieky  et  al.  IfTOll  I-P,  original  PST,  and  Branieky  et 
al.  DU  I-E.  This  is  true  for  all  memory  sizes.  This  is  a  signifieant  result,  as  one  way  to  say  that  a 
tree  has  diverse  paths  is  to  show  that  it  ean  handle  many  types  of  environments.  We  have  shown 
experimentally  that  our  SPST  method  ean  handle  more  randomly-generated  planning  queries  (or 
environments)  than  other  methods.  The  preeomputation  time  for  SPST  is  longer  than  that  for 
original  PST.  However,  the  preeomputation  ean  be  done  beforehand,  and  the  time  for  SPST  is 
still  reasonable.  In  eontrast,  the  other  three  methods’  times  are  signifieantly  slower;  their  times 
inerease  at  sueh  a  rate  that  it  is  diffieult  to  use  them  in  praetiee  for  large  trees,  and  we  ehose  to 
build  trees  with  depth  levels  of  5  (very  small)  for  this  set  of  experiments  just  so  we  ean  eompare 
the  methods.  The  density  values  justify  our  use  of  the  density  metrie.  A  smaller  density  value 
tends  to  eorrespond  to  a  higher  %  sueeess  rate,  whieh  matehes  our  intuition  that  seattering  the 
paths  of  the  tree  evenly  is  more  likely  to  lead  to  a  preeomputed  tree  that  ean  solve  more  planning 
queries.  The  tradeoff  of  SPST  here  is  that  it  provides  non-optimal,  but  near-optimal  solutions. 


Figure  4. 10|  explains  some  of  the  results  in  Table  4.1  “original  PST”  and  “Branieky  I-P” 
tend  to  keep  shorter  and  thereby  smaller-eost  paths.  On  the  other  hand,  “Branieky  I-E”  seem 
to  prefer  longer  paths,  and  henee  their  solutions  are  likely  to  be  further  away  from  optimal. 
“Green  and  Kelly”  build  trees  that  has  more  diverse  paths.  However,  its  preeomputation  time 
is  the  longest,  and  is  not  praetieal  for  trees  of  large  sizes.  SPST  builds  the  most  diverse  trees 
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in  the  sense  that  their  paths  are  spread  out  over  the  region  R,  in  this  ease  the  region  eovered 
by  the  5-level  exhaustive  tree.  Our  results  show  that:  our  simple  and  randomized-based  method 
is  efficient  and  can  outperform  other  tree  precomputation  methods  based  on  its  ability  to  solve 
randomly  generated  planning  queries.  This  suggests  that  the  effectiveness  of  sampling-based 
methods  also  applies  to  our  paradigm  of  motion  planning  with  precomputation,  although  this 
was  not  an  insight  that  we  originally  tried  to  show. 

We  would  like  to  mention  one  reeent  work  on  the  topie  of  path  diversity.  Eriekson  and 
LaValle  llT9l  deseribe  an  approaeh  to  build  sets  of  diverse  paths  based  on  a  survivability  metrie. 
This  metrie  tries  to  deerease  the  likelihood  that  paths  will  be  obstrueted  by  the  same  obstaele.  For 
example,  two  paths  that  mostly  overlap  with  (or  are  elose  to)  eaeh  other  will  not  be  as  preferred 
to  two  paths  that  eover  different  regions.  It  is  interesting  to  note  that  our  simple  density  metrie 
also  implieitly  tries  to  aehieve  this.  As  this  previous  work  is  only  reeently  published,  we  are 
unable  to  implement  their  method  to  eompare  it  with  ours.  However,  this  shows  that  the  issue 
of  path  diversity  is  also  of  eoneern  to  others,  and  it  would  eertainly  be  one  possibility  of  future 
work  to  compare  their  method  with  ours. 

4.7.3  Comparison  between  Precomputation  Approach  and  Traditional  For¬ 
ward  Search 

We  explore  the  benefits  and  tradeoffs  of  the  overall  precomputation  approach  along  with  the 
runtime  backward  search,  as  compared  to  traditional  forward  methods.  The  significance  of  this 
comparison  is  that  we  believe  it  would  useful  to  understand  what  we  have  gained  and/or  lost  from 
using  our  overall  precomputation  approach  and  backward  search.  We  specifically  compare  with 
A*-search  methods,  which  are  common  forward  search  methods  that  provide  optimal  solutions. 
We  would  like  to  compare  the  differences  between  the  traditional  “forward”  search  method  and 
our  new  “backward”  search  method.  By  “backward”  search,  we  refer  to  the  runtime  search  that 
starts  from  where  the  goal  location  is,  and  perform  a  search  of  the  paths  in  the  precomputed  tree 
from  the  goal  back  towards  the  starting  location.  This  is  in  contrast  to  “forward”  search  which 
builds  the  tree  during  runtime  from  the  start  to  the  goal.  For  the  experiments  in  this  subsection, 
we  use  relatively  larger  environments  since  we  are  able  to  build  trees  of  a  much  larger  scale. 

We  generate  random  planning  queries  as  before,  except  that  we  use  a  much  larger  R  region 
and  generate  a  larger  number  of  obstacles.  We  created  one  additional  test  environment  with  a 
C-shaped  obstacle  (similar  to  the  “deep  local  minima”  example  in  [[T4ll).  The  random  queries 
contain  a  mix  of  simple  and  complex  cases,  and  this  C-shape  obstacle  case  is  a  complex  exam¬ 
ple  with  local  minima.  Since  A*-search  and  SPST  search  in  different  directions,  we  place  the 
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start/goal  positions  differently  in  the  two  oases  so  that  the  direotion  is  always  moving  “into”  the 
C-shape,  whioh  makes  the  problem  more  diffieult. 


Method 

runtime 

collision  checks 

%  success 

path  cost 

A*-search 

100.00 

100.00 

97.91 

100.00 

wA*  (w=2) 

79.42 

12.31 

97.91 

105.12 

SPST  (50  MB) 

0.47 

7.27 

94.76 

113.80 

SPST  (25  MB) 

0.44 

4.23 

93.63 

115.48 

A*-search 

2,411,505 

2,885,740 

N/A 

786 

wA*  (w=2) 

1,276,468 

1,559,632 

N/A 

846 

SPST  (25  MB) 

461 

210 

N/A 

884 

Table  4.2:  Comparison  between  Precomputation  Approach  and  A*-search  Methods.  Top  set  of 
results:  from  random  planning  queries.  Bottom  set:  from  C-shaped  obstacle  case. 


In  Table  |4.2[  SPST  took  199  seconds  to  precompute  the  25  MB  tree  and  477  seconds  to 
precompute  the  50  MB  tree.  The  “runtime”  of  SPST  is  only  for  the  runtime  backward  search, 
“collision  checks”  is  the  number  of  collision  checks  performed.  “%  success”  is  the  %  of  1774 
total  queries  that  each  method  found  a  solution  for.  The  top  set  of  results  are  all  percentages.  We 
took  the  queries  (1661  of  them)  where  all  methods  found  a  solution  and  compared  the  runtime, 
collision  checks,  and  path  cost  of  these  solutions.  We  normalize  these  values  (runtime,  collision 
checks,  path  cost  each  separately)  for  the  A*-search  case  (the  optimal  case)  to  be  100,  and 
normalize  the  other  values  correspondingly.  We  then  computed  the  mean  of  all  the  normalized 
values  for  each  method,  and  reported  these  means  in  the  table  (top  set).  The  bottom  set  of  results 
are  actual  values.  The  runtime  in  that  case  is  in  /rs. 

The  main  benefit  of  SPST  over  A*-search  methods  is  the  significantly  faster  runtime  (>200 
times  for  the  random  planning  queries).  SPST  has  fewer  collision  checks  than  A*-search,  al¬ 
though  a  more  greedy  version  (weighted  A*)  can  also  lead  to  fewer  collision  checks.  The  main 
tradeoffs  of  SPST  are  that  it  gives  up  completeness  and  optimality  of  A*-search.  Completeness 
can  be  seen  in  the  “%  success”  column:  SPST’s  rates  are  a  few  %  smaller.  The  “%  success”  of 
SPST  must  be  smaller  than  that  of  A*-search,  because  SPST  is  only  able  to  find  solution  paths 
that  are  in  the  precomputed  tree.  Hence  it  is  encouraging  that  SPST  is  only  slightly  worse  here. 
Optimality  can  be  seen  in  the  “path  cost”  column:  SPST’s  path  costs  are  near-optimal,  and  usu¬ 
ally  about  10-15  %  higher  than  the  optimal  costs.  In  general,  as  we  increase  the  memory  size  of 
the  tree,  the  %  success  rate  increases  and  the  path  cost  %  decreases  to  the  “optimal”  percentages. 
The  user  can  adjust  the  tree’s  memory  size  to  explore  this  tradeoff.  The  purpose  of  the  C-shaped 
obstacle  case  is  to  make  sure  that  the  better  results  do  not  just  come  from  simple  queries  in  the 
random  set.  This  is  true  as  SPST  achieves  an  even  faster  runtime  and  fewer  number  of  collision 
checks  for  this  case. 

We  have  shown  the  advantages  and  disadvantages  of  our  precomputation  approach  compared 
to  A*-search  methods.  We  view  our  precomputation  concept  as  one  approach  that  can  be  consid- 


65 


ered  among  various  planning  methods.  It  is  important  to  understand  the  tradeoffs  of  our  approach 
before  deciding  to  use  it. 


4.7.4  Effect  of  Grid  Resolution 

The  obstacle  avoidance  between  the  characters  and  the  objects  in  the  environment  depend  on  the 
grids  that  we  use.  There  are  several  grids  placed  over  the  tree  and  the  environment.  The  grid 
resolution  is  an  important  parameter  that  affects  our  results.  We  therefore  empirically  study  the 
effect  of  different  grid  resolutions  on  the  runtime  cost  and  success  rate  of  finding  a  solution,  for 
A*-search  and  SPST.  We  also  intuitively  explain  the  failure  cases  of  both  methods  in  more  detail. 

We  used  the  same  experimental  setup  as  for  the  comparison  between  A*-search  and  SPST 
above.  We  changed  only  the  grid  resolution  and  kept  the  other  variables  the  same.  The  grid 
resolution  here  refers  to  the  one  for  the  Environment  Gridmap;  we  adjust  the  resolution  for  the 
other  gridmaps  accordingly. 


Method 

runtime 

270x270  540x540  1 080x 1 080 

%  success 

270x270  540x540  1080x1080 

A*-search 
SPST  (25  MB) 

100.00  104880  100.00  151770  100.00  340195 

0.41  333  0.52  690  0.80  2652 

97.91  97.91  97.91 

93.63  94.31  94.76 

Table  4.3:  Effect  of  grid  resolution  on  runtime  cost  and  success  rate. 


Table  4.3  shows  the  results  from  our  experiments.  The  success  rate  is  the  percent  of  1774 
total  queries  that  each  method  found  a  solution  for.  Eor  the  runtime  results,  there  are  two  values 
in  each  entry.  The  first  value  is  a  percentage,  and  the  second  one  is  the  average  time  for  the 
success  cases  in  /rs.  To  compute  the  percentages,  we  took  the  queries  where  both  methods  found 
a  solution  and  compared  the  runtime  of  these  solutions.  We  normalize  these  values  for  the  A*- 
search  case  (the  optimal  case)  to  be  100,  and  normalize  the  other  values  correspondingly.  We 
then  computed  the  mean  of  all  the  normalized  values  for  each  method,  and  reported  these  means 
in  the  table. 

In  general,  we  found  that  a  finer  grid  resolution  leads  to  an  increase  in  runtime.  This  makes 
sense  intuitively  as  the  time  for  mapping  the  obstacles  to  the  grid  takes  longer.  We  also  found 
that  a  finer  grid  resolution  leads  to  an  increase  in  the  success  rate.  Intuitively,  as  the  obstacle 
representation  gets  finer,  there  is  more  space  that  is  represented  as  collision  free,  and  there  is  a 
higher  chance  that  more  paths  become  collision  free. 

It  is  interesting  to  discuss  the  failure  cases  and  what  causes  them.  Eor  A*-search,  there  are 
cases  where  there  are  no  solutions  because  of  two  reasons.  The  first  reason  is  that  the  obstacle 
configuration  does  not  allow  the  goal  to  be  reached.  The  second  reason  is  the  obstacles  are 
cluttered  enough  that  there  is  no  path  that  can  reach  the  goal  given  the  existing  motion  clips.  Eor 
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SPST,  there  are  cases  where  there  are  no  solutions  because  the  precomputed  tree  does  not  have 
all  the  potential  paths  (given  the  existing  motion  clips).  In  fact,  the  number  of  paths  is  much  less 
than  the  number  for  the  A*-search  case,  as  A*-search  can  build  an  exhaustive  tree  to  cover  the 
free  space  in  the  environment.  It  is  therefore  reasonable  to  expect  the  success  rate  for  SPST  to 
be  smaller.  The  fact  that  it  is  only  a  few  percentages  smaller  shows  that  the  precomputed  case 
performs  quite  well.  We  would  like  to  discuss  one  specific  case  where  A*-search  found  a  solution 
but  SPST  did  not.  In  this  case,  the  A*-search  solution  was  not  long-winded,  but  it  requires  the 
right  combination  of  motion  clips  to  get  to  the  goal  (as  the  environment  was  cluttered  with  many 
obstacles).  SPST  did  not  find  a  solution  because  it  did  not  have  that  specific  combination  of 
motion  clips  in  the  already  computed  tree. 


4.8  Discussion 

We  have  developed  a  “Precomputed  Search  Trees”  technique  for  interactively  generating  realistic 
animations  for  virtual  characters  navigating  in  a  complex  environment.  The  main  contributions 
are  twofold.  First,  we  introduce  a  novel  planning  approach  based  on  precomputation:  we  first 
precompute  a  search  tree  of  possible  motion  paths  and  then  use  a  backward  search  method  dur¬ 
ing  runtime  to  solve  planning  queries.  We  show  that  our  approach  is  more  than  two  orders  of 
magnitude  faster  than  traditional  forward  search  methods.  Second,  we  present  a  technique  for 
precomputing  scalable  and  diverse  trees,  and  explore  the  advantages  and  disadvantages  of  our 
method  compared  to  previous  methods  for  building  diverse  trees. 

While  there  has  been  previous  work  on  computing  information  about  the  environment  or 
character  motions  in  advance  for  use  during  runtime,  the  main  additional  value  of  our  work 
over  previous  work  is  that  we  show  a  complete  system  that  demonstrates  the  concept  of  pre- 
computation  for  motion  planning:  we  show  how  to  precompute  large  and  diverse  search  trees; 
we  describe  an  efficient  runtime  backward  search  method  for  solving  planning  queries;  we  use 
these  methods  in  actual  planning  scenarios  and  show  runtime  results;  and  we  have  an  interactive 
system  with  many  characters  navigating  in  complex  environments  using  our  approach. 

Some  limitations  for  precomputed  search  trees  are  similar  to  our  behavior  planning  ap¬ 
proach.  We  assume  that  we  are  given  a  set  of  blendable  and  segmented  motion  clips  as  inputs. 
This  is  again  not  a  major  concern  as  the  number  of  behaviors  and  motion  clips  are  small.  An 
output  sequence  is  limited  to  be  a  concatenation  of  the  input  clips.  We  have  to  use  other  existing 
methods  for  motion  editing  if  this  is  needed. 

Another  limitation  of  precomputation  is  that  we  give  up  completeness  and  optimality  of  the 
solution,  compared  to  A*-search.  The  benefit  from  this  is  the  two  orders  of  magnitude  runtime 
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speedup  (also  compared  to  A*-search). 

One  insight  we  have  gained  from  our  work  is  that  precomputation  is  certainly  a  viable  ap¬ 
proach  for  motion  planning.  However,  there  is  a  tradeoff  between  memory,  runtime  speed,  and 
optimality.  These  issues  should  be  considered  before  choosing  between  precomputation  and 
traditional  search  methods. 

Another  insight  is  that  our  randomized-based  diverse  tree  works  surprisingly  well  in  terms 
of  being  able  to  solve  planning  queries,  even  though  the  method  is  simple.  Our  method  and 
also  previous  methods  for  tree  precomputation  are  greedy.  This  makes  sense  since  it  is  com¬ 
putationally  intensive  to  analyze  path  sets  of  large  sizes.  The  lesson  to  learn  here  is  that  if  we 
assume  the  method  is  going  to  be  greedy,  a  randomized-based  approach  is  a  simple  one  that 
works  well.  Spending  a  lot  of  time  to  compute  information  about  what  paths  to  pick  might  lead 
to  over- analyzing  in  the  sense  that  much  time  can  be  spent  without  improving  the  result. 

There  has  been  more  and  more  recent  work  in  the  topic  of  path  diversity.  We  believe  that 
this  is  an  important  issue  as  well,  and  more  future  work  can  be  done  on  this  topic.  More 
generally,  computing  path  sets  that  are  diverse  in  advance  is  one  essential  component  of  the 
concept  of  precomputation.  However,  most  previous  work  do  not  use  these  precomputed  path 
sets  to  actually  solve  planning  problems.  We  believe  that  further  experiments  (perhaps  similar  to 
ours)  can  be  done  to  compare  work  on  precomputing  path  sets  based  on  their  ability  to  handle 
different  types  of  environments. 
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Chapter  5 

Modeling  Spatial  and  Temporal  Variants  in 
Motion  Data 


Variation  in  human  motion  exists  because  people  do  not  perform  actions  in  precisely  the  same 
manner  every  time.  Even  if  a  person  intends  to  perform  the  “same”  action  more  than  once,  each 
motion  will  still  be  slightly  different.  However,  current  animation  systems  lack  the  ability  to 
realistically  produce  these  subtle  variations.  For  example,  typical  crowd  animation  systems  [|6^ 
utilize  a  few  walking  motion  clips  for  every  walking  cycle  and  every  character  of  the  simulation. 
This  can  lead  to  synthesized  motions  that  look  unrealistic  due  to  the  exact  repetition  of  the  origi¬ 
nal  walk  cycles.  Hence  a  variation  model  that  can  generate  even  slight  differences  of  the  original 
walk  cycles  has  the  potential  to  greatly  improve  the  naturalness  of  the  output  animations.  Crowds 
in  games  and  films  [[99l  also  do  not  produce  human- like  variations.  Films  use  a  technique  known 
as  “Cycle  Animation”:  animators  use  a  fixed  number  of  motion  cycles  to  create  the  motions 
of  multiple  characters.  Inevitably,  there  will  be  cycles  that  are  exactly  repeated  both  spatially 
(in  multiple  characters)  and  temporally  (at  different  times  for  the  same  character).  As  soon  as 
even  one  example  of  repetition  is  identified,  the  whole  animation  can  be  immediately  deemed 
un-human-like.  This  can  make  games  and  films  less  fun  and  interesting. 

There  is  much  interest  in  the  problem  of  adding  variety  to  virtual  crowds.  Maim  and  his 
colleagues  Ii64||  take  a  fixed  number  of  template  character  meshes,  and  vary  them  by  changing 
their  color  and  adding  different  accessories  to  them.  On  the  other  hand,  our  work  takes  a  fixed 
number  of  template  motions  and  synthesize  new  variant  motions  from  them.  McDonnell  and  her 
colleagues  libTl  perform  user  experiments  to  study  the  perception  of  clones  in  virtual  crowds.  The 
focus  of  their  work  is  to  study  the  perception  of  appearance  and  motion  clones,  and  to  provide 
insights  on  how  to  make  it  less  likely  for  the  end-user  to  detect  such  clones  assuming  that  clones 
are  being  used.  In  contrast,  our  approach  takes  input  data,  learns  a  model  from  the  data,  and 
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synthesizes  new  motion  variants  with  the  model.  Our  approaeh  ereates  motion  with  no  exact 
clones,  even  though  the  new  variants  ean  be  visually  similar.  We  start  by  assuming  that  it  is 
possible  to  generate  variation  in  motion  sueh  that  exaet  motion  elones  are  not  neeessary. 

For  the  problem  of  generating  variation  in  motion,  previous  methods  eonsider  variation  to  be 
an  additive  noise  eomponent.  This  is  not  robust  for  automatieally  generating  animations.  There 
are  methods  to  add  noise  to  existing  motion  [HllTSll,  but  there  is  no  guarantee  that  the  added  noise 
will  mateh  well  with  the  existing  motion. 

We  believe  that  variation  should  not  be  just  an  additive  noise  eomponent;  instead,  we  take 
a  data-driven  approaeh  to  this  problem.  Given  a  small  number  of  examples  of  a  partieular  type 
of  motion  (ie.  eheering,  walk  eyele,  swimming  breast  stroke)  as  input,  we  learn  a  model  from 
the  input  data,  and  use  this  model  to  synthesize  spatial  and  temporal  variants  of  that  motion.  We 
elaim  that  the  Dynamie  Bayesian  Network  (DBN)  [|2^  iTTll  model  solves  this  problem  well  as  it 
provides  a  formal  and  robust  approaeh  to  model  the  distribution  of  the  data.  A  DBN  represents 
a  multivariate  probability  distribution  of  the  degrees-of-freedom  of  motion,  and  it  is  this  distri¬ 
bution  from  whieh  we  sample  to  synthesize  our  new  variants.  In  addition,  one  advantage  of  our 
approaeh  is  that  it  ean  handle  a  small  number  of  input  examples.  This  is  useful  as  it  is  diffieult 
to  aequire  a  large  number  of  examples  of  a  partieular  motion.  Another  advantage  is  that  no  post- 
proeess  smoothing  operation  is  needed,  whieh  is  benefieial  as  sueh  an  operation  may  smooth  out 
details  of  motion  that  our  method  generates.  There  are  three  major  steps  for  learning  a  model  and 
synthesizing  new  variants.  First,  we  learn  the  strueture  of  the  DBN  with  the  input  examples.  We 
use  a  greedy  algorithm  based  on  a  variant  of  the  Bayesian  Information  Criterion  seore  to  learn 
a  good  strueture.  Seeond,  we  use  the  learned  strueture  and  the  original  data  to  synthesize  new 
variants.  Third  and  optionally,  we  ean  use  an  inverse  kinematies  method  developed  within  our 
DBN  framework  to  satisfy  any  foot  and  hand  eonstraints. 

The  key  result  of  our  method  is  that  we  ean  take  a  few  examples  of  a  partieular  type  of 
motion  as  input,  and  produee  an  unlimited  number  of  spatial  and  temporal  variants  as  output.  A 
new  variant  is  spatially  different  as  all  new  poses  are  distinet  from  those  of  the  input  examples, 
and  temporally  different  as  the  timing  of  the  whole  motion  is  distinet  from  the  input  examples. 
The  new  variants  are  statistically  and  visually  similar  to  the  inputs,  but  are  not  exact  copies. 
We  demonstrate  our  approaeh  with  a  variety  of  full-body  human  motion  data.  The  memory 
requirement  of  our  model  eonsists  of  only  the  spaee  required  to  store  the  few  input  examples. 
Most  of  the  proeessing  time  is  in  the  learning  phase;  the  runtime  for  synthesizing  new  variants 
is  very  effieient  and  ean  be  done  as  a  eontinuous  stream  one  frame  at  a  time.  To  evaluate  our 
approaeh,  we  perform  a  user  study  to  show  that:  (i)  our  new  variants  are  just  as  natural  as  motion 
eapture  data,  and  (ii)  our  new  variants  are  less  repetitive  than  “Cyele  Animation”.  In  addition,  we 
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demonstrate  that  “just  adding  noise”  to  existing  motion  can  create  poses  and  timings  that  look 
obviously  awkward.  We  show  this  with  two  methods  to  add  noise  to  motion:  (i)  a  naive/strawman 
method,  and  (ii)  the  Perlin  noise  function.  Finally,  since  our  input  examples  have  to  be  similar 
(so  that  we  can  model  their  variation),  it  is  useful  to  know  what  we  mean  by  “similar”  and  how 
we  get  them  to  begin  with.  Hence  we  provide  a  DBN-based  method  to  take  examples  from  raw 
data,  and  reduce  them  to  a  small  number  of  examples  that  can  be  used  well  with  our  approach. 


5.1  Problem  Statement  and  Overview 

The  inputs  to  our  problem  are  a  few  examples  of  a  particular  type  of  motion,  and  the  outputs 
are  the  spatial  and  temporal  variants  (of  the  inputs).  Let  the  inputs  be  Xi[j]^’'\  where  there  are  I 
motion  sequences  (usually  four),  i  is  an  index  for  DOF,  and  j  is  an  index  for  time.  We  build  a 
model  for  the  joint  distribution  of  the  inputs.  Let  this  model  be  {/i(Xj),  uar(Xj),  Gprior,  Gtrans}, 
where  /i(Xj)  and  var{Xi)  are  only  for  the  nodes  Xi  in  Gprior  with  Pa(Xj)  =  0,  and  Gprior  and 
Gtrans  arc  thc  prior  and  transition  networks  in  the  DBN  model.  In  addition,  we  need  to  keep  the 
original  data  because  both  the  learning  and  synthesis  processes  require  non-parametric 

regressions.  Let  a  synthesized  output  motion  sequence  be  Yi[j],  where  j  does  not  have  to  be  the 
same  length  as  the  input  times. 

The  following  is  an  overview  of  the  major  parts  of  the  rest  of  this  chapter: 

Dynamic  Bayesian  Network.  We  specify  the  notations  and  definitions  of  a  Bayesian  Network 
and  a  Dynamic  Bayesian  Network. 

Structure  Learning.  We  explain  the  details  of  our  learning  method.  Given  a  small  number 
of  motion  clips  that  represent  variations  of  a  particular  motion,  we  search  for  the  structure  of  the 
prior  and  transition  networks  of  a  DBN  model  automatically.  We  describe  our  non-parametric 
regression  method  for  computing  conditional  probability  distributions;  this  is  used  during  both 
learning  and  synthesis. 

Synthesis  of  New  Variants.  The  learned  model  and  original  input  data  are  used  to  generate 
any  number  of  spatial  and  temporal  variants  of  the  inputs. 

Constraints.  We  develop  an  inverse  kinematics  framework  that  fits  with  our  DBN  model  to 
satisfy  foot  and  hand  constraints. 

Evaluation.  We  show  results  for  full-body  human  motion  data.  The  same  approach  is  used  for 
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all  types  of  data.  We  discuss  the  memory  requirements  and  performance  time  of  our  technique. 
We  perform  a  user  study  to  evaluate  our  approach.  We  discuss  our  experiments  with  adding  noise 
to  motion  data. 

Inputs  that  work  well  with  Our  Approach.  The  main  limitation  of  our  approach  is  that 
the  inputs  have  to  be  “similar”  to  begin  with.  It  is  difficult  to  define  what  is  meant  by  “similar” 
motions.  Instead,  we  develop  a  method  to  characterize  the  inputs  that  would  work  well  with  our 
approach. 


5.2  Dynamic  Bayesian  Network 

We  first  describe  the  basic  formulation  and  notations  for  a  Bayesian  Network  (BN)  model,  and 
then  extend  this  description  to  a  Dynamic  Bayesian  Network  (DBN)  model  [|2^  12711. 

A  BN  is  a  directed  acyclic  graph  that  represents  a  joint  probability  distribution  over  a  set  of 
random  variables  X  =  {Xi, ...,  X„}.  Each  node  of  the  graph  represents  a  random  variable.  The 
edges  represent  the  dependency  relationship  between  the  variables.  A  node  Xj  is  independent  of 
its  non-descendants  given  its  parent  nodes  Pa(Xj)  in  the  graph.  This  conditional  independency 
is  significant  because  we  only  use  the  values  of  parent  nodes  of  Xi  to  predict  the  value  of  each 
Xi.  This  graph  defines  a  joint  probability  distribution  over  X  as  follows: 

P(Xi,...,X„)  =  I  Para)  (5-1) 

i 

A  DBN  models  the  process  of  how  a  set  of  random  variables  changes  over  time.  It  represents 
a  joint  probability  distribution  over  all  possible  trajectories  of  the  random  variables.  Figure  [5T] 
shows  an  example.  In  our  case  of  human  motion,  Xi  is  the  trajectory  of  values  of  the  i*^-DOF 
of  the  motion,  and  X[t]  is  the  set  of  values  of  all  the  DOFs  at  time  t.  Xi[t]  is  the  value  of  the 
z*^-DOF  at  time  t.  We  have  62  DOFs  in  our  motion  data,  so  n  is  62.  The  prior  network  Gprior 
represents  the  joint  distribution  of  the  nodes  in  the  first  two  time  points,  X[0]  and  X[l].  The 
transition  network  Gtrans  specifies  the  transition  probability  P{X[t  +  2]  |  X[f],  X[f  +  1])  for  all  t. 
Note  that  the  transition  network  predicts  the  values  at  time  t  +  2  given  those  at  t  and  t  +  1.  Hence 
there  are  no  incoming  edges  into  the  nodes  at  time  t  and  t  +  1.  In  the  usual  formulation  of  DBNs, 
there  can  be  edges  between  the  nodes  in  time  f  +  2.  In  our  case,  we  do  not  allow  edges  between 
these  nodes.  We  find  that  this  simplification  does  not  affect  our  results,  since  we  assume  that  an 
edge  from  Xi[t  +  1]  to  X2[t  +  2]  has  the  same  effect  as  an  edge  from  Xi[t  +  2]  to  X2[t  +  2]. 
We  assume  that  the  trajectories  satisfy  the  second  order  Markov  property:  the  values  at  t  +  2 
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Figures. 1:  A  DBN  for  the  variables  Xi,  Each  node  Xj  represents  one  DOF  in  the  motion 

data.  We  use  the  prior  network  to  model  the  first  2  frames.  The  transition  network  then  models 
subsequent  frames  given  the  previous  2  frames.  We  assume  a  second-order  Markov  property 
because  it  is  the  simplest  model  that  works  well. 


are  conditionally  independent  of  the  values  before  t  given  the  values  at  t  and  t  +  1.  We  found 
that  assuming  a  first-order  Markov  property  does  not  work  well  for  our  motion  data.  Hence  we 
assume  a  second-order  Markov  property,  which  is  the  simplest  model  that  works  well.  We  also 
assume  that  the  transition  probabilities  are  stationary:  the  probabilities  in  Gtrans  are  independent 
of  t.  The  DBN  defines  a  joint  probability  distribution  over  X[0], ...,  X[T]: 


T-2 

P(X10],...,X1T|)  =  Pg„„„(X10],X|1|)  ■  n^’G.™,..(X[*  +  2]  |  X[«|,X1(  +  1])  (5.2) 

t=0 

We  apply  a  non-parametric  approach  to  predict  X[t  +  2]  given  X[f]  and  X[t  -f  1].  Hence  we 
do  not  have  parameters  and  we  only  learn  the  dependency  structure  from  the  data.  The  data  itself 
represents  the  “function”  defined  in  a  non-parametric  approach.  Note  that  our  non-parametric 
regression  method  for  the  transition  network  slightly  differs  from  that  of  the  prior  network.  This 
improves  the  robustness  of  our  approach:  no  post-process  smoothing  operation  is  needed. 


5.3  Structure  Learning 

We  take  as  input  a  small  number  (usually  four)  of  motion  clips  of  a  particular  type  of  motion.  The 
motion  need  not  be  cyclic.  These  motion  clips  must  be  “similar”  to  each  other  and  each  motion 
clip’s  starting  pose  also  needs  to  be  similar,  as  we  are  trying  to  model  the  variation  between  the 
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clips.  Their  lengths  ean  be  different,  and  no  timewarping  or  synchronization  of  these  input  elips 
is  needed. 

Let  Useq  be  the  number  of  input  motion  sequenees,  where  the  motion  sequenee  has  length 
ni.  For  each  sequence,  the  data  in  the  first  two  frames  (X[0]  and  X[l])  are  used  to  train  the  prior 
network.  If  Useq  is  large  enough,  we  ean  use  the  first  two  frames  from  eaeh  sequenee.  Otherwise, 
we  can  also  take  more  pairs  of  frames  near  the  beginning  of  eaeh  sequence.  For  example,  we 
ean  take  the  first  ten  pairs  of  frames  (X[0]  and  X[l],  X[l]  and  X[2],  ...,  X[9]  and  X[10])  as  the 
training  data  for  the  prior  network.  Let  Uprior  be  the  total  number  of  sueh  instanees  or  pairs  of 
frames.  For  the  transition  network,  we  use  the  previous  two  frames  to  prediet  eaeh  frame.  Hence 
there  are  a  total  of  ritrans  =  ~  2)  instanees  of  training  data  for  the  transition  network.  The 

strueture  for  the  prior  and  transition  networks  are  learned  separately  given  this  data. 

Given  the  input  data,  we  wish  to  learn  the  best  structure  that  matches  the  data.  This  means  that 
we  would  like  to  find  the  best  graph  or  set  of  edges  in  the  DBN  that  best  matehes  the  data.  The 
set  of  nodes  are  already  defined  as  in  Figure  [STj  We  would  therefore  like  to  find  the  best  G  that 
matches  the  data  D\  P{G\D)  oc  P{D\G)  ■  P{G).  This  formulation  leads  to  a  seoring  funetion 
that  allows  us  to  eompute  a  seore  for  any  graph.  We  then  use  a  greedy  seareh  approaeh  to  find  a 
graph  with  a  high  score.  The  DBN  literature  provides  many  approaehes  to  compute  this  seore. 
One  possibility  is  the  Bayesian  Information  Criterion  (BIC)  seore:  there  is  one  term  in  this  seore 
eorresponding  to  P{D\G)  and  one  penalty  term  eorresponding  to  P{G).  We  use  a  similar  seore 
exeept  we  do  not  have  a  penalty  term.  Instead  we  perform  cross  validation  across  the  data  by 
splitting  the  data  into  training  and  test  sets,  a  eommon  strategy  in  existing  DBN  approaches 


Doing  eross  validation  allows  us  to  measure  how  well  a  given  graph  matehes  the  data  without 
overtraining  the  graph  on  the  data  and  without  using  a  penalty  term.  Section  |5.3.1|  describes 
the  greedy  search  for  a  graph,  and  the  seoring  functions  for  the  prior  and  transition  networks  in 
more  detail.  To  eompute  our  seore,  we  have  to  eompute  the  eonditional  probability  distribution 
for  eaeh  node:  P{Xi  \  Pa(Xj)).  We  use  a  non-parametrie  regression  approaeh  to  compute  this 
probability.  Seetion  5.3.2  provides  justifieation  for  this  approaeh,  and  more  details  about  the 
method. 


5.3.1  Structure  Search 

We  learn  the  strueture  by  defining  a  seoring  function  for  any  graph,  and  then  searehing  for  a 
graph  with  a  high  score.  This  is  done  separately  for  the  prior  and  transition  networks  of  the 
DBN.  The  seareh  part  of  our  algorithm  is  the  same  as  existing  DBN  teehniques;  the  seoring 
funetion  however  is  different  because  of  the  non-parametrie  regression.  The  problem  of  finding 
the  graph  with  the  highest  seore  is  in  general  an  NP-Complete  problem  due  to  the  large  number 
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of  nodes  in  the  graph.  Hence  we  use  a  greedy  search  approach. 


Prior  Network.  The  prior  network  is  a  BN.  We  derive  the  general  scoring  function  by  using  a 
maximum  likelihood  approach:  our  goal  is  to  find  the  graph  that  maximizes  P(D\G).  Remember 
that  we  do  not  use  a  P{G)  term  as  we  use  cross  validation  and  split  the  data  into  training  and  test 
sets.  The  score  for  the  prior  network  G prior  is 

loQ  P (^D^G prior') 

P'priov 

=  log  n  P{X^^^\G  prior) 

3=^ 

P'prior 

=  Y.^^3P{X^^^\Gpr^or) 

i=i 

P'prior  277. 

j=l  i=l 

where  represents  the  instance  of  the  prior  network  training  data,  and  is  the  value  at 
node  Xi  of  the  instance  of  data.  We  sum  over  each  instance  of  data  for  doing  leave-one-out 
cross  validation:  each  instance  is  one  example  of  testing  data  and  the  corresponding  training 
data  (used  in  the  non-parametric  regression)  does  not  include  that  instance.  So  the  training  data 
for  the  instance  is  the  set  of  all  Uprior  instances  of  the  prior  network  training  data  except  the 
instance.  Note  that  we  do  not  model  the  time  component  in  the  prior  network  even  though 
they  represent  the  first  and  second  frames  of  the  motion.  Hence  there  are  2n  total  nodes.  The  last 
equality  is  due  to  the  conditional  independence  of  the  nodes  given  their  parent  nodes. 

Algorithm  shows  the  pseudocode  of  the  structure  search  for  the  prior  network.  The  INI¬ 
TIALIZE  section  begins  with  a  prior  network  with  any  initial  set  of  edges.  is  the  set  of  all 

instances  for  node  Xj.  For  the  cross  validation,  the  function  split. samples{)  splits  the  instances 
into  S  equal  parts.  Each  part  will  be  considered  as  a  testing  set  and  the  remaining 

instances  form  the  training  set  in  each  of  the  S  cases.  In  the  conditional  probability 

computation,  the  training  sets  are  used  implicitly  in  the  non-parametric  regression.  Utests  is  the 
number  of  instances  in  the  set  curr. score  stores  the  current  contribution  of  each  node 

to  the  total  score. 

The  SEARCH  section  (Algorithm  finds  a  good  prior  network  based  on  the  score.  The 
gener ate. edge. updates {)  function  takes  the  current  prior  network  Gprior  and  computes  a  set  of 
prior  networks  making  small  changes  to  the  edges  in  Gprior-  There  are  three  possible 

edge  updates:  (i)  an  edge  addition  adds  a  directed  edge  between  two  nodes  that  were  not  origi¬ 
nally  connected,  (ii)  an  edge  deletion  deletes  an  existing  edge,  and  (iii)  an  edge  reversal  reverses 
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Algorithm  5:  structure  search  eor  prior  network 
Initialize: 

prior 

for  i  =  1  to  2n  do 
I  curr-Score[Xi]  =  0 

end 

for  i  =  1  to  2n  do 
for  s  =  1  to  S'  do 

^  split^samplesiX^^'^) 

[Pa(A:, ^  split.samples{Pa{Xi)<^^‘^^) 
curr.score[Xi]  +  =  Y.T=i‘  |  Pa(X, )(*“*=)) 

end 

end 

Search: 

overall  score  jimproves  =  TRUE 
while  (  over  all  score  Amproves  )  do 

^prior  ^  generatesdge.updates{Gp„or) 

G  prior  ^  take -best  score{Gp^[*}) 

end 

the  direction  of  an  existing  edge.  Note  that  these  are  all  subject  to  the  BN  constraint:  so  we 
cannot  apply  an  edge  update  that  creates  cycles  in  the  graph.  The  take -best -Score{)  function 
recomputes  the  score  for  each  prior  network  in  Gpttor  •  Since  the  total  score  can  be  separated 
into  sums  of  terms  for  each  node  Xj,  we  keep  track  of  each  node’s  contribution  to  the  total  score. 
Each  edge  update  in  the  greedy  search  can  affect  only  one  or  two  nodes,  so  we  will  not  have 
to  recompute  the  total  score  every  time  we  update  an  edge.  We  then  apply  the  edge  update  that 
gives  the  best  improvement  towards  the  overall  score,  and  we  continue  this  process  until  there  is 
no  improvement  in  the  overall  score.  As  this  greedy  method  depends  on  the  initial  set  of  edges, 
we  can  repeat  the  algorithm  multiple  times  by  initializing  with  a  different  set  of  edges  every  time. 
We  then  take  the  set  of  edges  with  the  highest  score  among  the  multiple  runs. 

Transition  Network.  For  the  transition  network,  we  use  a  similar  algorithm  to  learn  the 
structure.  The  difference  here  is  that  we  do  not  allow  any  incoming  edges  to  the  nodes  at  time 
t  and  t  +  1.  The  nodes  at  time  t  and  t  +  1  are  assumed  to  be  observed  and  are  used  to  predict 
those  at  time  t  +  2.  We  initialize  the  graph  with  the  edges  from  Xi[t]  to  Xi[t  +  2]  (Vi),  and  the 
edges  from  Xj[t  +  1]  to  Xi[t  +  2]  (Vi).  From  our  experience  with  the  data,  the  search  almost 
always  selects  these  edges.  Hence  we  always  keep  these  edges  throughout  the  search  to  make 
the  process  more  efficient.  The  scoring  function  is  similar  to  the  one  for  the  prior  network.  The 
score  for  the  transition  network  Gtrans  is  also  derived  from  the  P{D\G)  term: 
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Training  Data 


Testing  Data  { 


Compare 


Figure  5.2:  When  learning  the  strueture  for  the  transition  network,  we  do  a  eross  validation  over 
eaeh  motion  sequenee.  We  take  eaeh  sequenee  as  testing  data,  and  use  the  others  as  training 
data.  For  the  testing  sequenee,  we  take  the  first  two  frames  as  input  and  re-synthesize  the  whole 
sequenee  with  the  given  strueture.  The  newly  synthesized  sequenee  is  then  eompared  to  the 
original  data  to  evaluate  the  strueture.  This  is  what  we  eompute  intuitively  in  the  seoring  funetion 
for  the  transition  network  of  the  DBN. 


(5.4) 


l=l  j=2  i=l 


where  Xi[j]^’-'>  is  the  value  at  node  Xi[j]  of  the  motion  sequenee  of  the  transition  network 


training  data.  This  seore  is  different  from  the  BN  seore  in  that  we  start  with  the  first  two  frames 
in  eaeh  sequenee,  and  eompute  the  subsequent  frames  in  the  sequenee  by  propagating  the  eom- 
puted  frames.  So  the  seeond  frame  and  the  newly  synthesized  third  frame  are  used  to  eompute 
the  fourth  frame,  the  newly  synthesized  third  and  fourth  frames  are  used  to  eomputed  the  fifth 
frame,  and  so  on.  The  Pa  notation  represents  this  propagation  of  frames.  The  justifieation  for 
this  propagation  instead  of  treating  eaeh  instanee  separately  is  that  the  learned  strueture  would 
otherwise  not  give  a  good  result:  the  predieted  trajeetories  deviated  from  the  aetual  ones  when 
we  attempted  to  treat  eaeh  instanee  separately.  Intuitively,  sinee  we  propagate  the  values  when 
we  synthesize  a  new  motion  given  the  first  two  frames,  we  should  do  this  propagation  when  we 
learn  the  strueture.  We  are  effeetively  trying  to  eompute  how  good  a  given  strueture  is  by  trying 
to  re-synthesize  eaeh  input  motion  sequenee  given  the  first  two  frames,  and  eomparing  the  syn¬ 
thesized  sequenee  with  the  original  data  (Figure  [5^.  Note  that  we  sum  over  the  n  nodes  in  time 
t  +  2  as  these  are  the  ones  we  are  trying  to  eompute  in  the  transition  network. 

Algorithm  shows  the  pseudoeode  of  the  strueture  seareh  for  the  transition  network;  it  is 
similar  to  the  one  for  the  prior  network.  The  gener ate ^edge ^updates {)  funetion  makes  small 
ehanges  to  Gtrans  by  using  the  same  edge  update  rules  (as  the  prior  network),  and  ereates  a  set  of 
transition  networks  G^l^anl-  We  perform  eross  validation  for  eaeh  graph  in  this  set.  For  the  eross 
validation,  eaeh  motion  sequenee  forms  the  testing  data  (testi)  and  the  eorresponding  training 
data  {traini)  ineludes  all  the  motion  sequenees  exeept  for  the  sequenee.  The  compute scoreO 
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Algorithm  6:  structure  search  for  transition  network 

overall  -ScoreAmpr  oves  =  TRUE 
while  (  overall-score-improves  )  do 

^iranl  ^  generate-edgeJupdates{Gtrans) 

foreach  G  G  do 

for  I  =  1  to  riseq  do 
I  compute-Score{testi,traini) 

end 

end 

Gtrans  ^  takeJ)est-Score{G'f^^ns) 

end 


function  gives  us  the  seore  based  on  the  equation  given  above  for  the  transition  network,  exeept 
that  this  funetion  eomputes  the  score  for  each  part  of  the  eross  validation  and  adds  eaeh  seore 
to  the  total  seore.  Note  that  trairii  is  used  implieitly  in  the  non-parametrie  regression  for  oom- 
puting  the  eonditional  probability  in  the  seore.  The  take  .best. scorei)  funetion  seleets  the  Gtrans 
that  gives  the  best  improvement  towards  the  overall  seore;  it  sets  the  overall .score.improves 
variable  if  neeessary.  The  whole  proeess  is  repeated  until  the  overall  seore  does  not  improve. 

5.3.2  Non-Parametric  Regression  for  Computing  Conditional  Distribution 

The  seoring  funetions  for  the  prior  and  transition  networks  require  the  eomputation  of  the  eon¬ 
ditional  probability  P(Xj  |Pa(Xj)).  We  briefly  deseribe  the  parametrie  approaehes  that  we  at¬ 
tempted  to  use.  As  these  approaehes  did  not  work  well,  we  deeided  to  use  a  non-parametrie 
regression  method. 

Many  BNs  and  DBNs  that  treat  Xi  as  a  eontinuous  variable  use  a  linear  regression  model  [ItTI 
to  deseribe  the  relationship  between  Xi  and  its  parents.  We  attempted  to  model  the  relationship 
between  X^  and  its  parent  nodes  as  a  linear  relationship,  but  we  found  that  it  is  not  appropriate 
for  our  motion  data.  We  then  attempted  to  model  this  relationship  by  nonlinear  regression.  We 
tried  to  find  the  parameters  of  a  nonlinear  funetion  that  takes  the  parents  of  Xi  as  input  and  Xi 
as  output,  where  the  nonlinear  funetion  is  a  sum  of  multivariate  radial  basis  funetions.  While 
this  worked  well  for  the  prior  network  of  the  DBN,  it  performed  poorly  for  the  transition  net¬ 
work.  This  might  be  beeause  there  is  not  enough  data  to  aeeurately  estimate  the  parameters  of  a 
nonlinear  funetion.  Henee  we  decided  to  try  a  non-parametrie  method.  We  found  that  a  kernel 
regression  approaeh  worked  well  for  our  data. 

Prior  Network.  We  assume  that  P(Xj|Pa(Xj))  is  a  guassian  distribution,  and  use  kernel 
regression  to  find  the  mean  and  standard  deviation  of  this  distribution.  Reeall  that  we  are  given 
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the  graph  and  training  data.  The  graph  allows  us  to  find  the  parent  nodes  of  X,.  The  training 
data  allow  us  to  find  instanees  of  (px^,Xfc)  eorresponding  to  (Pa(Xi),Xi).  Note  that  we  also 
have  the  actual  value  of  Pa(Xj),  which  we  call  pa(Xj).  Since  a  large  number  of  the  instances 
px;.  are  far  away  from  pa(Xj),  we  pick  the  fc-nearest  instances.  The  notation  with  the  subscript 
k  represents  these  nearest  instances.  We  measure  the  distance  with  a  Euclidean-distance  metric: 
D(px^,  pa(Xi)).  We  then  compute  a  weight  for  each  instance: 

Wk  =  exp{-T>(pXfc,  pa(Xi))  (5.5) 

where  Kw  is  the  kernel  width.  Next,  we  compute  a  weighted  mean  and  variance  based  on  these 
weights: 


var{Xi) 


'^k^k 

'^k  1  Efc 


(5.6) 


where  Uk  is  the  number  of  non-zero  weights  Wk,  and  the  standard  deviation  a{Xi)  is  the  square 
root  of  the  above  variance.  With  the  mean  and  standard  deviation  of  Xt,  we  can  now  compute 
P{Xi\Psi{Xi)),  and  thereby  the  scores  for  the  structure  search.  For  the  prior  network,  we  have 
cases  where  Xi  has  no  parents.  To  compute  P(Xj),  we  find  instances  of  Xk  corresponding  to  Xj. 
The  mean  and  standard  deviation  of  X,  is  then  the  mean  and  standard  deviation  of  the  instances 


Xk. 


Transition  Network.  We  compute  one  distribution  for  each  node  i  at  time  t  +  2  (Xi[t  -1-2]). 
The  regression  for  the  transition  network  is  essentially  the  same  as  above  with  two  important 
modifications.  The  first  modification  is  that  we  also  have  a  weighted  velocity  term  when  com¬ 
puting  the  distance  function  P(px^, pa(Xi[f  -I-  2])).  This  velocity  term  is  {Xi[t  -f  1]  —  Xj[f]) 
(recall  that  Xj[f  -f  1]  and  Xi[t]  are  always  parent  nodes  of  Xj[f  -f  2]).  Including  this  term  allows 
us  to  find  k  nearest  instances  that  better  match  pa(Xj[f  -f  2]).  The  second  modification  is  related 
to  Xi[t  +  2],  whose  values  we  have  to  generate  in  order  to  compute  probabilities  and  scores  (as 
in  Figure [5^.  Instead  of  dealing  with  “absolute”  Xi[t  -f  2]  values,  we  deal  with  “delta”  Xi[t  -f  2] 
values.  In  Equation |5.6[  instead  of  Xk  representing  the  k  instances  of  Xi[t  -f  2]  (where  lower  x 
means  actual  value),  Xk  now  represents  the  k  instances  of  {xi\t  +  2]  —  Xi[t  +  1]).  And  instead 
of  Xi  representing  Xi[t  -f  2],  X*  now  represents  (Xj[f  +  2]  —  Xi[t  +  1]).  To  generate  an  actual 
value  of  Xi[t  -f  2],  we  take  /i(Xi[f  +  2]  —  Xi[t  +  1])  and  add  this  to  the  existing  Xi[t  +  1]  value. 
Intuitively,  since  the  “absolute”  values  have  a  much  wider  range,  sampling  from  this  range  will 
require  post-process  smoothing.  The  “delta”  values  have  a  small  range,  and  sampling  from  it  is 
more  robust  and  no  post-process  smoothing  is  needed. 
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5.4  Synthesis  of  New  Variants 


We  can  use  the  learned  structure  and  the  input  data  to  synthesize  an  unlimited  number  of  new 
spatial  and  temporal  variants.  Since  the  DBN  represents  a  joint  probability  distribution,  we 
sample  from  this  distribution  to  synthesize  new  variants.  We  represent  the  /i’s  and  cr’s  that  are 
computed  for  each  node  as  a  set  (/I,  a).  If  we  pick  a  =  0,  this  gives  the  mean  motion  of  the 
inputs.  The  set  (/r,  a)  represents  variations  of  motions  away  from  this  mean  motion.  Note  that 
the  /i’s  are  not  fixed,  since  the  /i’s  and  cr’s  from  previous  time  frames  can  affect  the  /i’s  in  later 
time  frames. 


Prior  Network.  We  synthesize  the  first  2  frames  of  a  new  motion  with  the  prior  network.  We 
first  find  the  partial  ordering  of  the  2n  nodes  in  the  prior  network.  Such  an  ordering  always  exists 
since  BNs  are  acylic.  We  generate  values  for  each  of  these  nodes  according  to  this  ordering. 
The  nodes  at  the  beginning  will  be  the  ones  without  parents.  We  sample  a  value  from  each  of 
the  guassian  distribution  of  these  nodes.  The  rest  of  the  nodes  will  depend  on  values  already 


generated.  We  use  the  procedure  in  Section  5.3.2  to  find  the  mean  and  standard  deviation  for 
each  node,  except  that  we  use  the  learned  structure  and  all  the  nprior  instances  every  time.  We 
then  sample  a  value  from  the  distribution  of  each  node. 


Transition  Network.  Given  the  first  2  frames,  we  synthesize  subsequent  frames  by  “unrolling” 
the  DBN  (Figure  [53]).  We  perform  one  non-parametric  regression  for  each  node  at  each  time 
frame.  We  use  the  learned  structure  and  all  the  Utrans  instances  every  time.  We  use  the  procedure 


in  Section  5.3.2  to  compute  actual  values  of  Xi\t+2].  The  main  difference  is  that  after  computing 
jji{Xi[t+2]— Xi[t+1])  and  war  (Xj  [f + 2]  —  Xj  [f + 1] ) ,  we  sample  from  this  distribution  and  add  the 
value  to  the  existing  Xi[t+1]  value  to  get  the  Xi\t+2]  value.  No  post-process  smoothing  operation 
is  needed.  If  the  input  motions  are  cyclic,  we  can  synthesize  a  continuous  and  unlimited  stream 
of  new  poses. 


5.5  Constraints 

The  synthesized  poses  from  the  previous  section  might  need  to  be  cleaned  up  for  handling  foot 
and  hand  constraints.  This  fixes  footskate  problems  and  also  deals  with  cases  where  the  foot/hand 
has  to  be  at  a  specific  position.  We  develop  an  inverse  kinematics  framework  that  fits  with  our 
DBN  approach.  Intuitively  we  have  to  satisfy  three  constraints:  (i)  the  foot/hand  needs  to  be  at 
specific  positions  at  certain  times,  (ii)  the  solution  should  be  close  to  the  mean  values  (at  each 
node  and  time)  predicted  by  the  DBN,  and  (iii)  the  solution  should  maintain  smoothness  with 
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Figure  5.3:  We  “unroll”  the  DBN  from  Figure [5T| to  synthesize  new  variants.  We  show  here  the 
unrolled  network  for  5  time  frames.  Note  that  the  first  two  frames  come  from  the  prior  network  of 
the  DBN  and  may  not  contain  cycles.  Since  the  DBN  represents  a  joint  probability  distribution 
over  the  possible  trajectories  of  each  DOF,  we  sample  from  this  distribution  to  generate  new 
variants.  It  is  important  to  recognize  that  the  synthesized  motion  does  not  have  a  one-to-one 
correspondence  with  any  one  of  the  input  motions.  This  means  that  the  synthesized  motion  is  not 
just  a  copy  of  one  of  the  input  motions  plus  some  slight  differences,  but  the  timing  of  the  whole 
motion  itself  is  different.  Furthermore,  no  new  pose  is  exactly  the  same  as  any  previous  pose. 


respect  to  the  previous  frames.  The  first  constraint  is  a  hard  inverse  kinematics  constraint  while 
the  last  two  are  soft  constraints.  This  naturally  leads  to  an  optimization  solution: 


min 

qt 


{■u^illqi  -  Qiir  +  W2Ht  -  2qi_i  +  qt-airi 

s-t.  ||f(qt) -pos||2  =  0 


(5.7) 


where  q^  is  the  set  of  DOFs  for  one  foot  or  hand  at  time  t.  There  are  6  joint  angles  for  each 
foot,  and  7  for  each  hand,  q^  is  the  set  of  mean  values  (of  the  corresponding  nodes  and  time) 
predicted  by  the  DBN,  and  q^_2  are  the  DOFs  from  the  previous  two  frames,  f()  is  the 
forward  kinematics  function  that  gives  the  end-effector  3D  position  corresponding  to  q^,  and 
“pos”  is  the  3D  position  that  we  want  the  foot/hand  to  be  at.  We  run  an  optimization  for  each 
foot/hand  and  time  frame  separately.  If  there  is  a  large  amount  of  motion,  these  3D  positions  and 
frames  can  be  found  with  automated  methods  .  But  we  find  that  it  is  not  difficult  to  identify 
these  manually  for  our  motions.  We  initialize  the  optimization  with  the  solution  we  sampled 


from  the  DBN.  Since  the  solution  we  get  from  Section  5.4  is  already  close  to  what  we  want,  the 
optimization  only  makes  minor  adjustments  and  is  therefore  efficient.  The  optimization  uses  a 
sequential  quadratic  programming  method.  We  set  wi  to  1  and  ^2  to  5. 
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5.6  Evaluation 


A  main  result  of  our  work  is  that  we  can  synthesize  spatial  and  temporal  variants  of  the  input 
examples.  Spatial  variation  means  that  no  new  pose  is  exactly  the  same  as  any  of  the  input  poses 
or  previously  synthesized  poses.  Spatial  differences  can  usually  be  better  seen  in  images  of 
poses.  Temporal  variation  means  that  a  new  variant  motion  has  a  different  timing  than  any  of  the 
input  motions  or  previously  synthesized  variant  motions.  It  is  important  to  recognize  that  a  new 
variant  does  not  have  a  one-to-one  correspondence  with  any  of  the  input  motions.  This  means 
that  the  new  variant  is  not  just  a  copy  of  one  of  the  input  motions  plus  some  slight  differences 
(as  is  the  case  in  previous  work),  but  the  timing  of  the  whole  motion  itself  is  different.  Temporal 
differences  can  usually  be  better  seen  in  animations.  The  new  variants  are  therefore  statistically 
and  visually  similar  to  the  inputs,  but  are  not  exact  copies  of  them. 

In  general,  we  expect  our  approach  to  work  on  time-series  data  with  DOFs  that  are  correlated. 
This  means  that  some  DOFs  are  correlated  with  others,  but  it  is  not  necessary  that  all  DOFs  are 
related  to  each  other.  The  DBN  model,  by  design,  works  on  these  types  of  data.  Experimentally, 
we  show  that  our  approach  works  for  many  types  of  human  motion  data. 

We  assume  that  our  data  satisfies  a  2nd-order  Markov  property  in  our  DBN  model.  There 
are  two  questions  that  arise  from  this  assumption.  The  first  question  is  why  we  used  a  2nd- 
order  model  instead  of  1st,  3rd  or  higher  orders.  We  tried  our  algorithm  by  assuming  a  Ist-order 
property.  While  we  can  learn  a  structure  and  generate  the  first  frame  of  motion  from  the  prior 
network,  the  subsequent  frames  that  are  generated  by  the  transition  network  do  not  make  sense 
at  all.  After  a  few  frames,  the  new  poses  will  diverge  away  from  the  poses  in  the  input  motions. 
Intuitively,  the  algorithm  is  unable  to  find  nearest  instances  that  are  truely  “near”  the  existing 
previous  frame,  and  hence  it  cannot  generate  the  corresponding  next  frame  accurately.  However, 
we  find  that  using  two  previous  frames  works  well  in  finding  the  k  nearest  instances,  and  this 
is  the  reason  that  a  2nd-order  model  works  well.  We  did  not  try  3rd  or  higher  order  models, 
since  we  already  have  a  simpler  (2nd-order)  model  that  works  well.  We  believe  that  higher 
order  models  will  produce  similar  results  while  taking  a  longer  runtime.  The  second  question 
is  whether  considering  the  previous  2  frames  is  enough.  Although  there  are  only  2  frames,  for 
our  human  motion  with  62  DOFs,  there  are  actually  124  pieces  of  information.  This  information 
is  enough  for  the  algorithm  to  find  the  nearest  “patches”  (or  nearest  instances)  of  input  data,  in 
order  to  perform  the  non-parametric  regression  to  generate  a  subsequent  frame. 
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Figure  5.4:  Results  for  eheering,  walk  eyele,  and  swimming  motion.  In  eaeh  eolumn,  the  top 
image  shows  the  4  inputs  (overlapped,  eaeh  with  different  eolor)  and  the  bottom  image  shows 
the  15  outputs  (overlapped,  eaeh  with  different  eolor).  These  are  frames  from  the  animations. 


5.6.1  Results  for  Full-body  Human  Animation 


We  show  results  for  four  types  of  human  motion  data:  eheering,  walk  eyele,  swimming  breast 
stroke,  and  jumping.  We  use  respeetively  433,  322,  384,  and  309  frames  of  data  (at  60  frames 
per  seeond)  as  input.  These  are  the  total  number  of  frames  for  eaeh  motion  type.  We  have  four 
input  motion  elips  in  eaeh  ease. 


We  find  that  four  input  motions  is  the  smallest  number  that  learns  a  DBN  strueture  that  gives 
good  results.  We  ean  learn  a  strueture  with  a  smaller  number  of  inputs,  but  it  does  not  synthesize 
reasonable  motion  at  all.  A  larger  number  of  inputs  also  works  well,  but  we  show  the  robustness 
of  our  method  by  showing  that  it  works  with  only  a  few  inputs.  We  use  values  from  15  to  60 
to  find  the  k  nearest  instanees.  In  the  learned  DBN  strueture,  eaeh  node  has  between  2  and  15 
parent  nodes  (exeept  for  the  nodes  in  the  prior  network  that  have  no  parents).  After  learning  a 
strueture,  we  ean  synthesize  variants  of  the  four  inputs.  The  results  for  eheering,  walk  eyele, 
and  swimming  breast  stroke  motions  (Figure [SA])  show  variants  generated  with  the  four  inputs  in 
eaeh  ease.  Given  the  learned  strueture  and  just  one  input  motion  elip,  we  ean  also  use  the  same 
approaeh  to  synthesize  variants  of  that  single  input.  The  results  for  jumping  motions  (Figure 


5.5)  show  variants  generated  with  just  one  input.  In  addition,  if  the  motion  is  eyelie,  we  ean 
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Figure  5.5:  Given  the  learned  structure  and  just  one  jumping  motion  as  inputs,  we  synthesize 
four  new  variant  motions.  We  overlap  poses  from  these  four  new  motions  at  similar  time  phases 
of  the  jump.  We  can  see  the  variations  in  the  poses  at  these  time  phases.  The  poses  for  the  head 
vary  the  least  because  the  head  poses  also  vary  the  least  in  the  input  data. 

synthesize  a  continuous  stream  of  new  cycles.  We  have  examples  of  animations  for  walk  cycles 
and  swimming  motion. 

Figure  |5^  show  graphs  of  the  input  and  output  cheering  motions.  While  these  graphs  are 
for  cheering  motion,  they  are  typical  of  similar  graphs  of  other  motion  types.  Note  that  the  new 
output  variants  follow  the  general  trajectories  of  the  inputs,  but  are  not  exactly  the  same.  In  the 
middle  column  of  the  figure,  we  can  see  some  of  the  joint  correlations.  For  example,  knowing 
the  value  of  the  right  shoulder  can  help  us  predict  the  value  of  the  left  shoulder.  These  joint 
relationships  are  learned  automatically.  For  the  right  column  of  the  figure,  we  performed  PCA  of 
the  input  and  output  data,  and  plotted  the  results  from  the  first  few  PCA  dimensions.  The  PCA 
reduces  the  62-DOF  data  to  1 1  dimensions,  keeping  more  than  99%  of  the  energy. 

5.6.2  Memory  and  Performance  Time 

Memory  is  needed  for  storing  the  learned  DBN  structure  and  the  four  input  motions.  The  DBN 
structure  consists  of  a  set  of  sparse  directed  edges  in  G prior  and  Gtrans,  and  the  means  and  stan¬ 
dard  deviations  of  the  nodes  in  G prior  that  have  no  parents.  The  memory  for  the  DBN  structure 
is  small,  and  hence  the  total  memory  is  essentially  the  four  input  motions.  It  takes  between  half 
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Figure  5.6:  Plots  of  four  inputs  (in  blue)  and  fifteen  output  variants  (in  blaek  or  green)  for 
eheering  motion.  Eaeh  eurve  represents  one  motion  elip.  Note  that  these  motions  are  not  eyelie. 
Left  Column:  Two  seleeted  plots  of  DOF  vs.  time.  Middle  Column:  Two  seleeted  plots  of  DOF 
vs.  DOF.  Right  Column:  Two  seleeted  plots  of  PCA-dimension  vs.  PCA-dimension. 


an  hour  and  two  hours  to  learn  the  DBN  strueture  for  eaeh  type  of  human  motion.  This  learning 
proeess  ean  be  done  offline.  The  runtime  proeess  of  synthesizing  new  human  motion  takes  about 
200  ms.  to  generate  1  seeond  of  motion. 


5.6.3  User  Study 

We  performed  two  experiments  in  the  user  study.  For  Experiment  A,  we  eompare  “Our  Variants” 
with  motion  eapture  data.  “Our  Variants”  are  motion  elips  generated  by  our  approaeh.  The 
purpose  is  to  deeide  whieh  is  more  natural.  We  ran  this  experiment  for  eheering  motion  and 
walk  eyeles  separately.  Eaeh  user  watehes  a  random  mixture  of  15  of  these  motion  elips.  After 
watehing  eaeh  motion,  we  ask  the  user  to  provide  a  seore  from  1  to  9  (inelusive)  of  how  natural  or 
human-like  that  motion  is.  A  higher  seore  eorresponds  to  more  naturalness.  We  tested  15  users, 
and  we  have  a  total  of  225  seores.  We  performed  ANOVA  on  these  seores.  For  eheering  motion, 
p  is  0.930  and  this  suggests  that  the  means  from  the  two  samples  (of  “Our  Variants”  and  motion 
eapture  data)  are  not  signifieantly  different.  For  walk  eyeles,  p  is  0.578  and  this  again  suggests 
that  the  means  from  the  two  samples  are  not  signifieantly  different.  Therefore,  for  both  eheering 
motion  and  walk  eyeles,  motion  synthesized  by  our  approaeh  was  not  found  to  be  signifieantly 
less  natural  than  motion  eapture  data. 
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For  Experiment  B,  we  eompare  “Our  Variants”  with  “Cycle  Animation”.  “Our  Variants”  are 
long  sequences  where  each  sequence  consists  of  at  least  15  motion  clips  generated  by  our  ap¬ 
proach.  These  motion  clips  are  all  slightly  different.  A  “Cycle  Animation”  is  a  long  sequence 
consisting  of  at  least  15  motion  clips:  each  of  these  is  randomly  selected  from  the  4  input  mo¬ 
tions.  The  name  of  “Cycle  Animation”  is  inspired  by  common  techniques  used  in  films  to  gener¬ 
ate  the  motions  for  many  characters  using  a  small  number  of  motion  clips.  Since  this  is  a  general 
term,  we  cannot  say  that  our  sequences  are  similar  to  the  ones  used  in  films,  as  there  are  many 
other  parameters  that  can  be  considered  in  an  actual  film  setting.  The  purpose  is  to  decide  which 
is  more  repetitive.  Specifically,  a  long  sequence  is  repetitive  if  many  of  the  motion  clips  within 
the  sequence  are  exactly  repeated.  We  ran  this  experiment  for  cheering  motion  and  walk  cycles 
separately.  Each  user  watches  a  random  mixture  of  15  of  these  long  sequences.  After  watching 
each  sequence,  we  ask  the  user  to  provide  a  score  from  1  to  9  (inclusive)  of  how  repetitive  that 
sequence  is.  A  higher  score  corresponds  to  more  repetition.  We  tested  15  users,  and  we  have  a 
total  of  225  scores.  We  performed  ANOVA  on  these  scores.  Eor  cheering  motion,  p  is  1.02e-8 
and  this  suggests  that  the  means  from  the  two  samples  (of  “Our  Variants”  and  “Cycle  Anima¬ 
tion”)  are  significantly  different.  Eor  walk  cycles,  p  is  4.49e-8  and  this  again  suggests  that  the 
means  from  the  two  samples  are  significantly  different.  Therefore,  for  both  cheering  motion  and 
walk  cycles,  “Our  Variants”  are  less  repetitive  than  “Cycle  Animation”.  In  Experiment  B,  note 
that  each  long  sequence  has  at  least  15  motion  clips.  It  takes  some  time  to  recognize  whether 
or  not  there  are  clips  that  are  exactly  repeated.  Hence  Experiment  B  does  not  apply  to  relatively 
short  animations,  since  “motion  clones”  are  difficult  to  detect  in  short  animations  (as  shown  in 

m). 

5.6.4  Experiments  with  Adding  Noise 

A  simple  possible  approach  to  generate  variation  is  to  add  noise  to  existing  motion.  We  experi¬ 
mented  with  two  such  methods  on  the  walk  cycle  data.  The  first  is  a  naive  or  strawman  method. 
We  time-warp  the  four  input  walk  cycles,  compute  simple  statistics  of  the  time- warped  data  for 
each  DOE  separately,  and  use  this  information  to  add  smoothed  noise  to  one  of  the  four  input 
cycles.  We  check  that  the  noise-added  motion  is  changed  by  a  similar  amount  compared  to  the 
variants  that  our  DBN  approach  generates.  We  do  so  by  taking  pairs  from  our  fifteen  variants 
and  the  four  inputs  (each  pair  has  one  variant  and  one  input),  computing  the  normalized  sum 
of  squared  differences  of  joints  between  each  pair,  and  modeling  these  sums  as  a  normal  dis¬ 
tribution.  We  also  compute  the  normalized  sum  of  squared  differences  of  joints  between  the 
noise-added  motion  and  its  corresponding  input,  and  check  that  this  sum  is  within  one  standard 
deviation  of  the  mean  of  the  normal  distribution  above.  We  tried  an  example  where  we  added 
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Figure  5.7:  Left:  Example  frame  from  walk  cycle  motion  clip  with  strawman  noise  added  to 
left  shoulder/arm.  The  left  shoulder  turns  in  a  way  that  does  not  synchronizes  with  the  right  arm. 
Right:  Example  frame  from  walk  cycle  motion  clip  with  Perlin  noise  added  to  right  hip/knee. 
The  right  hip/knee  pause  and  move  in  a  way  that  do  not  synchronize  with  the  rest  of  the  walk 
cycle.  In  both  cases,  the  unnaturalness  of  the  timing  of  the  whole  walk  cycle  can  be  better  seen 
in  the  animations. 


noise  only  to  the  left  shoulder  and  elbow  (Figure  [5^ left).  The  resulting  walking  motion  shows 
that  the  left  shoulder/arm  motion  is  unnatural,  and  does  not  fit  with  the  rest  of  the  walking  mo¬ 
tion.  In  contrast,  our  DBN  approach  will  learn  that  the  left  shoulder  is  correlated  with  other 
joints,  and  handle  these  issues  autonomously.  We  tried  another  example  where  we  added  noise 
to  all  joints.  While  the  overall  walk  motion  still  exists,  it  is  obvious  that  the  poses  and  timing  of 
the  motion  are  awkward.  Furthermore,  adding  noise  requires  a  smoothing  process  that  can  take 
away  details  from  the  original  motion. 

The  second  method  is  to  add  band-limited  noise  to  one  of  the  four  input  cycles  with  the  Perlin 
noise  function  ITTSII.  We  also  perform  the  same  noise-addition  check  as  in  the  first  method.  Figure 
|5. 7 1  (right)  shows  an  example  where  we  added  Perlin  noise  to  the  right  hip/knee  joints  of  a  walk 
cycle  from  motion  data.  The  right  hip/knee  pause  in  such  a  way  that  they  do  not  synchronize  with 
the  rest  of  the  motion.  We  also  tried  examples  where  we  added  noise  to  several  joints.  In  general, 
we  found  that  a  trial- and-error  process  of  manual  parameter  tuning  is  needed.  Most  importantly, 
a  human  understanding  of  the  motion  (ie.  if  the  left  arm  swings  higher,  the  right  arm  is  more 
likely  to  swing  higher)  is  required  to  add  noise  the  right  way.  Otherwise,  the  motion  can  become 
spatially  or  temporally  awkward. 
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Figure  5.8:  Two  examples  of  motion  sets  that  do  not  work  with  our  approach.  There  are  five 
overlapped  walk  cycles  in  each  case.  Four  of  them  are  inputs  (in  blue)  that  are  similar,  and  the 
other  one  (in  magenta)  does  not  fit  together  with  these  four.  Left:  For  the  one  that  does  not  fit, 
the  arms  swing  higher  than  the  other  four.  Right:  For  the  one  that  does  not  fit,  the  motion  turns 
slightly  to  the  right. 

5.7  Inputs  that  work  well  with  Our  Approach 

One  limitation  of  our  approach  is  that  the  input  motion  examples  have  to  be  “similar  but  slightly 
different”.  They  have  to  be  “similar”  because  we  are  learning  a  model  for  that  particular  type 
of  motion.  They  have  to  be  “slightly  different”  because  the  small  differences  among  the  inputs 
are  where  we  get  the  variation  from.  In  the  results  section,  we  have  shown  examples  that  work 
with  our  approach.  Here,  we  describe  and  show  examples  of  inputs  that  do  not  work  with  our 
approach.  For  our  walk  cycle  data,  we  have  four  input  motions  where  the  character  walks  two 
steps  forward.  If  there  is  a  walk  cycle  where  the  character  swings  the  arms  much  higher  (Figure 
|5. 8 1  left),  it  will  not  fit  with  the  original  four  motions  and  will  not  work  as  another  input  motion. 
We  are  still  able  to  use  the  five  motions  to  learn  a  DBN  structure,  but  the  synthesis  step  may  not 
produce  a  reasonable  output  motion.  For  this  and  similar  cases,  we  recommend  using  the  method 
below  to  first  eliminate  the  ones  (ie.  higher  arm  swing  one  in  this  case)  that  do  not  fit  with  the 
rest  of  the  motions.  However,  if  we  have  four  or  more  of  such  higher- arm- swinging  walk  cycles, 
they  can  be  used  together  in  our  approach  effectively.  Another  example  is  a  walk  cycle  where 
the  character  turns  slightly  to  one  side  while  walking  forward  (Figure  [5^  right) .  This  will  also 
not  fit  with  the  original  four  inputs. 

It  is  difficult  to  precisely  define  what  is  meant  by  “similar”  motions.  Instead  of  making  such 
a  definition,  we  introduce  a  method  (that  we  can  precisely  describe)  to  characterize  the  types  of 
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Figure  5.9:  Left:  Plot  of  frequency  versus  likelihoods  for  the  training  data.  Right:  We  started 
with  a  testing  set  of  eight  walk  cycles,  and  our  method  selected  these  five  to  be  similar  to  the 
ones  in  the  training  set. 


inputs  that  work  well  with  our  approach.  Our  method  is  data-driven  and  based  on  DBNs.  The 
overall  idea  is  to  take  a  given  set  of  training  data  that  we  already  know  works  well  with  our 
approach.  We  then  take  a  new  set  of  testing  data,  and  eliminate  motion  clips  from  this  set  one  by 
one  until  we  are  left  with  a  set  that  can  also  work  well  with  our  approach. 


We  start  with  a  given  set  of  training  data  that  we  know  works  well.  This  data  can  either  be 
selected  manually  (ie.  we  tested  them  on  our  approach)  or  can  come  from  the  results  of  this 
method.  As  an  example,  we  started  with  eight  walk  cycles  that  we  have  already  selected.  We 
split  these  eight  into  groups  of  six  and  two.  We  learn  a  DBN  with  the  group  of  six  and  compute 
likelihoods  with  the  learned  DBN  for  each  of  the  other  two.  We  use  the  likelihoods  that  are 


described  in  Section  5.3  We  repeat  this  process  for  different  combinations  of  six  and  two.  The 
idea  is  to  get  a  number  of  likelihoods  that  we  can  use  to  characterize  the  training  data.  Figure [5^ 
(left)  shows  a  plot  of  the  likelihoods  that  we  got  from  this  procedure.  The  training  data  is  only 
for  walk  cycles.  We  tried  to  include  a  training  set  of  cheering  motions.  However,  we  found  that 
because  the  legs  and  feet  of  the  character  do  not  move  during  the  cheering  motion,  those  joint 
values  can  be  well  predicted  by  default.  We  therefore  decided  that  the  training  data  should  only 
contain  one  type  of  motion  (ie.  walk  cycles). 


With  the  set  of  likelihoods  from  the  training  data,  we  can  set  a  threshold  (Figure  |5^  left)  for 
deciding  the  likelihoods  that  we  should  accept  in  the  new  testing  data.  This  is  a  parameter  that 
we  choose  by  ourselves.  We  set  the  threshold  to  be  the  tenth  percentile  of  all  the  likelihoods.  We 
can  now  take  a  new  testing  set  of  motion  clips.  We  used  a  new  test  set  of  eight  walk  cycles  in  our 
example.  We  again  separate  this  set  into  different  groups  of  six  and  two,  so  that  we  can  compute 


89 


a  likelihood  for  each  motion  clip.  We  then  eliminate  the  motion  clip  with  the  lowest  likelihood 
if  it  is  lower  than  the  threshold.  We  now  have  seven  motion  clips  and  we  repeat  the  process  to 
computing  likelihoods  for  each  of  the  seven  clips.  We  stop  this  process  until  the  lowest  likelihood 
is  above  the  threshold.  In  our  example,  this  process  stops  with  five  walk  cycles  (Figure [fi!9| right) . 
The  clips  that  got  eliminated  have  different  speeds,  heights  of  arm  swings,  and  angles  of  turning 
compared  to  the  remaining  five.  If  all  the  clips  in  the  test  set  are  different  from  the  training  data, 
the  method  will  eliminate  all  of  them. 


5.8  Discussion 

We  have  presented  a  method  for  modeling  and  synthesizing  variation  in  motion  data.  We  use 
a  Dynamic  Bayesian  Network  to  model  the  input  data.  This  allows  us  to  build  a  multivariate 
probability  distribution  of  the  data,  which  we  sample  from  to  generate  new  motion.  Given  input 
data  of  a  type  of  motion,  our  model  can  be  used  to  generate  new  spatial  and  temporal  variants  of 
that  motion.  We  show  that  our  approach  works  with  a  variety  of  full-body  human  motion.  For 
applications  such  as  crowd  animation,  our  method  has  the  advantage  of  being  able  to  take  small, 
pre-defined  example  cycles  of  motion,  and  generate  many  variations  of  these  cycles. 

Our  contribution  and  additional  value  over  previous  work  is  in  the  study  of  the  problem  of 
generating  variation  in  motion  data,  which  is  still  relatively  unexplored.  Instead  of  considering 
variation  as  an  additive  noise  component,  we  take  a  data-driven  approach  and  apply  learning 
techniques  to  this  problem.  We  introduce  a  novel  method  to  model  and  synthesize  variation  for 
many  types  of  motion  data.  Our  model  takes  a  small  number  of  input  motions,  and  synthesizes 
spatial  and  temporal  variants  that  are  statistically  similar  to  the  inputs. 

One  important  limitation  is  that  the  input  examples  must  come  from  a  particular  type  of 
motion  (ie.  walk  cycles,  swimming).  Our  current  approach  cannot  combine  different  motion 
types.  The  inputs  also  have  to  be  “similar”,  but  we  specifically  describe  how  we  can  get  inputs 
that  work  well  with  our  approach. 

We  have  shown  that  it  is  possible  to  automatically  generate  spatial  and  temporal  motion  vari¬ 
ants  from  a  small  amount  of  data.  One  interesting  insight  is  that  our  non-parametric  technique  is 
similar  to  texture  synthesis  methods.  Non-parametric  texture  synthesis  methods  search  for  sim¬ 
ilar  patches  of  the  previous  pixels  in  order  to  generate  the  next  pixel.  Our  method  also  searches 
for  similar  patches  of  previous  frames  in  order  to  generate  the  next  frame  of  motion. 

In  general,  we  believe  that  the  overall  problem  of  generating  motion  variation  is  still  relatively 
unexplored  and  there  are  many  open  issues.  Our  work  is  just  the  beginning  and  we  think  of  it  as 
one  step  towards  solving  the  overall  problem. 
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Chapter  6 

Putting  it  All  Together 


We  started  with  the  realistic  generation  of  navigation  motion  for  many  characters  as  our  goal.  In 
Chapter  3,  we  described  a  method  to  plan  for  sequences  of  motion  for  virtual  characters  to  reach 
user-specified  goal  locations.  We  then  explored  a  technique  to  speed  up  the  motion  synthesis 
process  by  using  the  idea  of  precomputation  in  Chapter  4.  And  in  Chapter  5,  we  have  a  method 
for  generating  motion  variations  to  a  small  number  of  existing  motion  clips.  We  now  discuss 
ways  to  combine  these  approaches. 

We  first  discuss  how  to  incorporate  our  variation  method  into  the  behavior  planning  frame¬ 
work.  We  begin  with  four  motion  clips  of  captured  data  for  each  behavior.  For  example,  we  have 
four  similar  but  slightly  different  motion  clips  in  the  “forward  jogging”  node.  There  are  two 
choices  we  can  take  here:  (i)  we  can  either  learn  a  DBN  structure  and  generate  a  number  (ie.  ten 
to  a  hundred)  of  clips  for  each  behavior  in  advance,  and  store  them  for  use  in  real-time;  or  (ii)  we 
can  learn  a  structure  in  advance,  and  then  generate  the  variants  in  real-time.  We  suggest  the  first 
choice  since  it  does  not  require  a  lot  of  memory  to  store  the  generated  variants.  We  then  have  the 
same  set  of  behaviors  or  nodes  as  before,  except  with  more  motion  clips  per  node.  The  number 
of  clips  required  depends  on  the  number  of  characters,  camera  viewpoint,  size  of  environment, 
and  (most  importantly)  user  perception.  It  would  be  a  good  direction  of  future  work  to  explore 
the  number  of  different  clips  needed  for  the  user  to  perceive  enough  motion  variation.  Given  the 
extended  set  of  motion  clips,  we  can  build  the  tree  of  our  behavior  planning  framework  in  the 
same  way  as  before,  except  that  we  instantiate  one  specific  clip  from  every  node  that  we  choose 
during  runtime.  Even  though  the  motion  clips  can  be  slightly  different  in  the  overall  position  and 
timing,  this  will  not  affect  the  rest  of  the  algorithm.  The  planner  can  run  the  same  way  as  before. 

We  can  also  incorporate  our  variation  method  into  the  precomputed  search  trees  framework. 
Since  the  precomputation  technique  uses  the  same  data  structure  and  motion  clips  as  the  ones  for 
behavior  planning,  we  are  using  all  three  techniques  together  here.  As  described  above,  we  can 
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also  start  with  four  clips  for  each  behavior  or  node.  For  ineorporation  into  the  precomputation 
framework,  we  particularly  suggest  generating  the  new  variants  for  eaeh  node  in  advance.  This 
will  maintain  the  speed  of  the  runtime  seareh,  which  is  an  important  feature  of  the  whole  teeh- 
nique.  The  multiple  elips  in  each  node  will  again  be  slightly  different  in  the  position  and  timing. 
In  the  tree  precomputation  step,  as  we  expand  a  node  in  the  tree,  we  instantiate  one  speeific  mo¬ 
tion  clip  randomly  from  the  ones  for  that  node.  This  should  provide  the  variety  that  we  ean  get 
from  the  existing  elips  if  we  assume  that  each  clip  in  the  tree  is  equally  likely  to  be  used.  Another 
possibility  is  to  timewarp  the  motion  clips  to  get  the  same  position  and  time  between  the  first  and 
last  pose  in  each  clip  of  every  node.  However,  this  would  take  away  from  the  variation  that  our 
method  generates,  and  hence  we  do  not  suggest  timewarping  them.  Once  the  tree  is  built,  the 
gridmaps  that  are  built  for  the  tree  and  the  runtime  baekward  seareh  step  can  execute  the  same 
way  as  before. 

In  our  implementation  of  the  eombination  of  the  three  methods,  we  precomputed  the  tree 
by  doing  runtime  instantiation  of  a  specifie  motion  elip  for  each  node.  In  the  motion  data  that 
we  were  using  for  the  precomputation  method,  we  usually  have  only  one  clip  of  most  types  of 
motion.  Hence  we  took  one  of  eaeh  of  these  motion:  forward  jogging,  jogging  and  turning  left 
about  30  degrees,  jogging  and  turning  left  about  45  degrees,  jogging  and  turning  right  about  30 
degrees,  and  jogging  and  turning  right  about  45  degrees.  We  take  this  original  data  set  as  it  is 
likely  to  be  a  more  typieal  data  set  in  praetiee  to  begin  with  than  having  multiple  elips  of  each 
motion  type.  It  would  be  a  good  test  of  our  method  to  show  that  it  can  work  with  only  these  few 
motion  clips.  Since  the  foot  motions  among  these  elips  are  different,  we  do  not  ehange  them  in 
the  new  variants.  We  ehange  only  the  arms  and  upper  body  motion.  We  have  tested  that  these 
upper  body  motions  can  be  used  well  to  learn  a  DBN  structure.  When  learning  a  strueture,  we 
also  use  the  foot  DOFs  as  potential  independent  variables.  This  allows  the  upper  body  motion 
to  synehronize  with  the  lower  body.  We  take  the  foot  motions  for  the  forward  jogging,  and 
generate  the  upper  body  and  arm  motions  using  our  method.  Each  variant  has  a  slightly  different 
positioning  of  the  eharacter,  but  the  same  ehange  in  overall  time  beeause  we  are  re-using  the 
foot  motions.  This  is  a  tradeoff  we  have  to  make  given  the  small  amount  of  original  data.  We 
generate  twelve  new  variants  of  forward  jogging  this  way.  We  also  generate  five  new  variants  of 
eaeh  type  of  jogging  and  turning  motions.  As  we  explained  above,  the  number  of  variants  that 
we  should  generate  depend  on  several  factors  that  can  be  a  perception  study  for  future  work.  We 
precompute  one  tree  using  the  original  five  motion  clips,  and  another  tree  using  the  new  variants 
together  with  the  original  five  elips.  We  generated  the  motions  for  multiple  eharacters  using  these 
two  trees.  Figure [6T] shows  example  frames  of  the  resulting  animations  from  the  two  oases.  Near 
the  beginning  of  the  generated  motions  for  the  case  with  five  inputs,  we  can  see  the  repetition  of 
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Figure  6.1:  Left:  Example  frame  of  resulting  animation  generated  with  five  original  motion 
clips.  Right:  Example  frame  of  resulting  animation  generated  with  thirty-two  variants  together 
with  the  five  original  clips. 


the  forward  jogging  clip  since  it  happened  to  use  that  clip  for  multiple  characters  side-by-side. 
Eor  the  case  with  the  new  variants,  we  cannot  observe  this  repetition  quite  as  much.  Another 
main  difference  between  the  two  cases  is  that  the  arm  swinging  motions  are  slightly  different, 
although  one  has  to  watch  the  animations  for  some  time  before  recognizing  the  differences. 
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Chapter  7 

Discussion  and  Future  Work 


We  have  studied  two  main  problems  for  generating  virtual  erowds:  (i)  how  to  model  human  be¬ 
haviors  sueh  that  intuitive  sequences  of  motions  for  a  large  number  of  characters  can  be  generated 
efficiently,  and  (ii)  how  to  model  and  synthesize  variation  in  motion  data. 

Our  contributions  are: 

•  A  planning  approach  that  applies  heuristic  search  methods  to  efficiently  generate  goal- 
driven  navigation  motion  for  virtual  human-like  characters.  Compared  to  methods  that 
use  large  data  sets  of  motion,  we  show  that  we  can  use  a  small  set  of  segmented  motion 
clips  to  generate  motions  for  a  large  number  of  characters  navigating  simultaneously  in 
dynamic  environments.  Specifically,  we  can  use  about  twenty  segmented  clips  to  generate 
the  navigation  motions  for  one  hundred  characters. 

•  A  novel  precomputation-based  approach  to  use  human  motion  data  to  generate  navigation 
motion:  we  first  precompute  a  search  tree  of  possible  motion  paths  with  the  data,  and  then 
use  a  backward  search  method  during  runtime  to  solve  planning  queries.  We  show  that 
our  approach  is  more  than  two  orders  of  magnitude  faster  than  traditional  forward  search 
methods  such  as  A*-search.  Although  there  are  tradeoffs  of  memory  and  completeness  (of 
solutions)  for  the  approach,  there  are  many  situations  where  these  tradeoffs  are  worthwhile 
given  the  faster  runtime  speed. 

•  We  present  a  technique  for  precomputing  large  diverse  trees,  and  explore  the  advantages 
and  disadvantages  of  our  method  compared  to  previous  methods  for  building  diverse  trees. 
We  have  learned  that  a  randomized-based  approach  for  precomputing  our  trees  works  well 
in  terms  of  the  trees  being  able  to  handle  as  many  environments  as  possible. 
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•  We  study  the  problem  of  generating  variation  in  motion  data.  Instead  of  eonsidering  vari¬ 
ation  as  an  additive  noise  eomponent,  we  take  a  data-driven  approaeh  and  apply  learning 
teehniques  to  this  problem.  We  show  that  we  ean  use  Dynamie  Bayesian  Networks  to 
synthesize  an  unlimited  number  of  variants  automatieally.  This  proeess  does  not  require 
manual  parameter  tuning  and  is  not  tedious  eompared  to  the  major  previous  approaeh  of 
adding  noise. 

•  We  show  that  we  ean  use  our  method  to  model  and  synthesize  variation  for  many  types  of 
human  motion  data.  Our  model  takes  a  small  number  of  input  motions,  and  synthesizes 
spatial  and  temporal  variants  that  retain  original  features  of  the  inputs  but  are  not  exaet 
eopies  of  them.  Our  approaeh  is  novel  in  that  there  is  no  previous  automated  method  that 
ean  generate  sueh  variants  for  human  motion  data. 

We  think  of  our  Behavior  Planning  framework  as  one  method  among  a  speetrum  of  methods. 
The  planning  aetion  spaee  is  earefully  ehosen  sueh  that  animations  ean  effieiently  and  easily  be 
generated.  In  general,  we  believe  that  our  framework  ean  also  be  applied  to  other  areas  sueh 
as  roboties.  The  underlying  planning  algorithm  ean  be  used  for  robots  if  a  set  of  well-defined 
aetions  are  given  as  input.  For  example,  if  a  set  of  aetions  are  defined  for  manipulation  or  arm 
reaehing  tasks,  we  ean  use  the  same  framework  to  generate  sequenees  of  sueh  motions.  This  ean 
be  done  for  both  virtual  eharaeters  and  robots. 

We  believe  that  our  Precomputed  Search  Trees  approaeh  ean  be  used  to  generate  motions 
for  multiple  eharaeters.  Currently,  our  system  builds  a  tree  for  the  motions  of  one  eharaeter.  It 
should  be  possible  to  extend  this  for  two  eharaeters  or  more.  For  example,  the  branehes  of  the 
tree  ean  alternate  between  the  motions  of  two  eharaeters,  and  exeeuting  the  motions  along  a  path 
in  the  tree  ean  eorrespond  to  the  motions  for  both  eharaeters  at  the  same  time. 

For  both  our  planning  frameworks,  we  ean  further  test  their  eapabilities  by  using  them  to 
generate  as  many  eharaeters’  motion  as  possible.  Is  there  a  limit  to  the  number  of  eharaeters  that 
ean  be  generated?  What  is  the  bottleneek,  and  ean  we  do  anything  to  deal  with  the  bottleneek  to 
extend  the  existing  systems? 

For  our  variation  teehnique,  we  believe  that  our  work  is  just  the  beginning  and  there  are  many 
open  issues  within  the  overall  problem  (some  of  whieh  we  deseribe  below).  There  has  been  some 
reeent  work  on  this  problem  and  we  expeet  to  see  more  on  this  topie  in  the  future. 

One  interesting  area  for  future  work  is  to  provide  a  method  for  the  user  to  eontrol  the  variation 
that  is  generated.  A  simple  way  to  “eontrol”  the  output  motion  is  simply  by  taking  different  input 
data  to  begin  with.  If  the  motion  is  jumping  and  we  have  input  data  that  has  large  variations  in 
the  swinging  of  the  arms,  then  the  synthesized  motions  will  also  have  large  variations  in  the  arm 
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swing.  If  the  input  data  has  more  variation  in  the  head  movement,  the  synthesized  motions  will 
have  more  variation  in  the  head.  Another  method  that  we  have  tried  is  to  use  parameters  such 
as  the  kernel  width  to  tune  the  variation  that  can  be  generated.  While  this  provides  the  ability 
to  generate  variants  that  are  closer  to  or  further  away  from  the  mean  motion,  it  can  be  difficult 
to  guarantee  that  the  motion  will  look  natural.  One  possible  challenge  is  therefore  to  enable  the 
user  to  more  intuitively  control  the  variation  in  a  motion  while  automatically  constraining  the 
output  to  lie  within  the  “natural”  range  of  movement. 

We  can  further  analyze  the  range  of  variation  that  can  be  created.  We  typically  use  four  input 
motions  because  it  is  the  smallest  number  that  works  well,  and  it  is  common  to  only  have  a  small 
number  of  motions  available.  We  have  tried  to  use  up  to  ten  input  motions,  and  we  can  create 
more  variants  with  ten  inputs  as  long  as  they  span  a  wider  range  of  space.  For  future  work,  we 
can  analyze  more  formally  the  range  of  variation  we  can  get  given  a  set  of  inputs.  Perhaps  we 
can  take  the  ten  inputs  and  reduce  them  to  six  (or  some  smaller  number)  because  we  can  still  use 
the  six  inputs  to  re-create  the  range  of  variation  that  the  ten  inputs  can  create. 

There  are  possibilities  for  performing  more  perception  experiments,  which  can  be  shorter- 
term  but  useful  future  work.  We  can  further  study  motion  variation  by  adjusting  these  variables: 
different  characters  (other  than  the  one  skeleton  that  we  have),  a  larger  number  of  characters, 
skinned  characters,  and  camera  viewpoint.  Do  these  variables  affect  how  users  perceive  motion 
variation?  In  addition,  the  level-of-detail  is  important  to  the  user’s  perception.  How  many  vari¬ 
ants  are  needed  at  different  levels?  Or  at  different  camera  viewpoints?  We  can  try  to  add  noise  to 
existing  motion  using  the  strawman  method  to  see  if  users  can  perceive  those  as  more  unnatural. 
These  are  all  possible  questions  that  can  be  explored. 

Another  area  of  future  work  that  can  be  a  longer  term  goal  is  to  use  the  idea  of  variation  to 
compress  motion  data.  If  we  can  say  that  a  set  of  motion  clips  are  variations  of  each  other,  it  may 
be  possible  to  discard  some  of  these  motions.  This  is  beause  we  can  potentially  re-synthesize 
a  discarded  motion  from  the  remaining  motions,  since  the  discarded  one  is  a  variation  of  the 
remaining  ones. 
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